Hiroaki Shikano | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hiroaki Shikano is active.

Explore More

Publication

Featured researches published by Hiroaki Shikano.

languages and compilers for parallel computing | 2005

Compiler control power saving scheme for multi core processors

Jun Shirako; Naoto Oshiyama; Yasutaka Wada; Hiroaki Shikano; Keiji Kimura; Hironori Kasahara

With the increase of transistors integrated onto a chip, multi core processor architectures have attracted much attention to achieve high effective performance, shorten development period and reduce the power consumption. To this end, the compiler for a multi core processor is expected not only to parallelize program effectively, but also to control the voltage and clock frequency of processors and storages carefully inside an application program. This paper proposes a compilation scheme for reduction of power consumption under the multigrain parallel processing environment that controls Voltage/Frequency and power supply of each processor core on a chip. In the evaluation, the OSCAR compiler with the proposed scheme achieves 60.7 percent energy savings for SPEC CFP95 applu without performance degradation on 4 processors, and 45.4 percent energy savings for SPEC CFP95 tomcatv with real-time deadline constraint on 4 processors, and 46.5 percent energy savings for SPEC CFP95 swim with the deadline constraint on 4 processors.

symposium on vlsi circuits | 2007

Heterogeneous Multiprocessor on a Chip Which Enables 54x AAC-LC Stereo Encoding

Masaki Ito; Takashi Todaka; Takanobu Tsunoda; Hiroshi Tanaka; Tomoyuki Kodama; Hiroaki Shikano; Masafumi Onouchi; Kunio Uchiyama; Toshihiko Odaka; Tatsuya Kamei; Ei Nagahama; Manabu Kusaoke; Yusuke Nitta; Yasutaka Wada; Keiji Kimura; Hironori Kasahara

A heterogeneous multiprocessor on a chip has been designed and implemented. It consists of 2 CPUs and 2 DRPs (Dynamic Reconfigurable Processors). The design of DRP was intended to achieve high-performance in a small area to be integrated on a SoC for embedded systems. Memory architecture of CPUs and DRPs were unified to improve programming and compiling efficiency. 54times AAC-LC stereo encoding has been enabled with 2 DRPs at 300 MHz and 2 CPUs at 600 MHz.

Heterogeneous Multicore Processor Technologies for Embedded Systems | 2012

Heterogeneous Multicore Processor Technologies for Embedded Systems

Kunio Uchiyama; Fumio Arakawa; Hironori Kasahara; Tohru Nojiri; Hideyuki Noda; Yasuhiro Tawara; Akio Idehara; Kenichi Iwata; Hiroaki Shikano

To satisfy the higher requirements of digitally converged embedded systems, this book describes heterogeneous multicore technology that uses various kinds of low-power embedded processor cores on a single chip. With this technology, heterogeneous parallelism can be implemented on an SoC, and greater flexibility and superior performance per watt can then be achieved. This book defines the heterogeneous multicore architecture and explains in detail several embedded processor cores including CPU cores and special-purpose processor cores that achieve highly arithmetic-level parallelism. The authors developed three multicore chips (called RP-1, RP-2, and RP-X) according to the defined architecture with the introduced processor cores. The chip implementations, software environments, and applications running on the chips are also explained in the book. Provides readers an overview and practical discussion of heterogeneous multicore technologies from both a hardware and software point of view;Discusses a new, high-performance and energy efficient approach to designing SoCs for digitally converged, embedded systems;Covers hardware issues such as architecture and chip implementation, as well as software issues such as compilers, operating systems, and application programs;Describes three chips developed according to the defined heterogeneous multicore architecture, including chip implementations, software environments, and working applications.

asia and south pacific design automation conference | 2008

Software-cooperative power-efficient heterogeneous multi-core for media processing

Hiroaki Shikano; Masaki Ito; Kunio Uchiyama; Toshihiko Odaka; Akihiro Hayashi; Takeshi Masuura; Masayoshi Mase; Jun Shirako; Yasutaka Wada; Keiji Kimura; Hironori Kasahara

A heterogeneous multi-core processor (HMCP) architecture, which integrates general purpose processors (CPU) and accelerators (ACC) to achieve high-performance as well as low-power consumption with the support of a parallelizing compiler, was developed. The evaluation was performed using an MP3 audio encoder on a simulator that accurately models the HMCP. It showed that 16-frame encoding on the HMCP with four CPUs and four ACCs yielded 24.5-fold speed-up of performance against sequential execution on one CPU. Furthermore, power saving by the compiler reduced energy consumption of the encoding to 0.17 J, namely, by 28.4%.

ieee international conference on high performance computing data and analytics | 2005

Performance evaluation of compiler controlled power saving scheme

Jun Shirako; Munehiro Yoshida; Naoto Oshiyama; Yasutaka Wada; Hirofumi Nakano; Hiroaki Shikano; Keiji Kimura; Hironori Kasahara

Multicore processors, or chip multiprocessors, which allow us to realize low power consumption, high effective performance, good cost performance and short hardware/software development period, are attracting much attention. In order to achieve full potential of multicore processors, cooperation with a parallelizing compiler is very important. The latest compiler extracts multilevel parallelism, such as coarse grain task parallelism, loop parallelism and near fine grain parallelism, to keep parallel execution efficiency high. It also controls voltage and clock frequency of processors carefully to reduce energy consumption during execution of an application program. This paper evaluates performance of compiler controlled power saving scheme which has been implemented in OSCAR multigrain parallelizing compiler. The developed power saving scheme realizes voltage/ frequency control and power shutdown of each processor core during coarse grain task parallel processing. In performance evaluation, when static power is assumed as one-tenth of dynamic power, OSCAR compiler with the power saving scheme achieved 61.2 percent energy reduction for SPEC CFP95 applu without performance degradation on 4 processors and 87.4 percent energy reduction for mpeg2encode, 88.1 percent energy reduction for SPEC CFP95 tomcatv and 84.6 percent energy reduction for applu with real-time deadline constraint on 4 processors.

Archive | 2012

Heterogeneous Multicore Architecture

Kunio Uchiyama; Fumio Arakawa; Hironori Kasahara; Tohru Nojiri; Hideyuki Noda; Yasuhiro Tawara; Akio Idehara; Kenichi Iwata; Hiroaki Shikano

In order to satisfy the high-performance and low-power requirements for advanced embedded systems with greater flexibility, it is necessary to develop parallel processing on chips by taking advantage of the advances being made in semiconductor integration. Figure 2.1 illustrates the basic architecture of our heterogeneous multicore [1, 2]. Several low-power CPU cores and special purpose processor (SPP) cores, such as a digital signal processor, a media processor, and a dynamically reconfigurable processor, are embedded on a chip. In the figure, the number of CPU cores is m. There are two types of SPP cores, SPPa and SPPb, on the chip. The values n and k represent the respective number of SPPa and SPPb cores. Each processor core includes a processing unit (PU), a local memory (LM), and a data transfer unit (DTU) as the main elements. The PU executes various kinds of operations. For example, in a CPU core, the PU includes arithmetic units, register files, a program counter, control logic, etc., and executes machine instructions. With some SPP cores like the dynamic reconfigurable processor, the PU executes a large quantity of data in parallel using its array of arithmetic units. The LM is a small-size and low-latency memory and is mainly accessed by the PU in the same core during the PU’s execution. Some cores may have caches as well as an LM or may only have caches without an LM. The LM is necessary to meet the real-time requirements of embedded systems. The access time to a cache is non-deterministic because of cache misses. On the other hand, the access to an LM is deterministic. By putting a program and data in the LM, we can accurately estimate the execution cycles of a program that has hard real-time requirements. A data transfer unit (DTU) is also embedded in the core to achieve parallel execution of internal operation in the core and data transfer operations between cores and memories. Each PU in a core processes the data on its LM or its cache, and the DTU simultaneously executes memory-to-memory data transfer between cores. The DTU is like a direct memory controller (DMAC) and executes a command that transfers data between several kinds of memories, then checks and waits for the end of the data transfer, etc. Some DTUs are capable of command chaining, where multiple commands are executed in order. The frequency and voltage controller (FVC) connected to each core controls the frequency, voltage, and power supply of each core independently and reduces the total power consumption of the chip. If the frequencies or power supplies of the core’s PU, DTU, and LM can be independently controlled, the FVC can vary their frequencies and power supplies individually. For example, the FVC can stop the frequency of the PU and run the frequencies of the DTU and LM when the core is executing only data transfers. The on-chip shared memory (CSM) is a medium-sized on-chip memory that is commonly used by cores. Each core is connected to the on-chip interconnect, which may be several types of buses or crossbar switches. The chip is also connected to the off-chip main memory, which has a large capacity but high latency.

international conference on parallel architectures and compilation techniques | 2007

Power-Aware Compiler Controllable Chip Multiprocessor

Hiroaki Shikano; Jun Shirako; Yasutaka Wada; Keiji Kimura; Hironori Kasahara

Chip multi-processors (CMP) have attracted much attention since they achieve higher performance not by raising operating frequency but by utilizing a number of transistors in parallel. However, simply increasing the number of processor elements (PE) will result in raising power consumption. This work presents a power-aware compiler controllable heterogeneous CMP and its performance and power evaluation with the OSCAR (optimally scheduled advanced multiprocessor) parallelizing compiler (K. Ishizaka et al., 2004).

international conference on data communication networking | 2017

QoS Analysis on Cable Video Delivery Networks.

Hiroaki Shikano; Takeshi Shibata; Qifeng Shen; Miyuki Hanaoka; Mariko Miyaki; Masahide Ban; Prasad V. Rallapalli; Yukinori Sakashita

Telecommunication service providers are now using analytics to differentiate their network services with better Quality of Service (QoS) and Quality of Experience (QoE) as well as to increase operational efficiency. Cable service providers have complex video delivery IP networks, and efficient management of such networks as well as reduction of service impairment time are imperative. We have developed a new cable service QoS analysis solution that provides an end-to-end view of the control and data-plane flows associated with delivery of a video channel and shows identified failure points. Initial evaluation revealed that the solution can reduce the service down time to 62.3% of what it would be without the solution.

acm symposium on applied computing | 2013

Study on supporting technology for operational procedure design of IT systems in cloud-era datacenters

Hiroaki Shikano; Machiko Asaie; Junji Yamamoto; Tatsuya Saito; Shunsuke Ota; Keitaro Uehara

Datacenters are now widely used, and their sizes are increasing due to the rapid spread of cloud computing. Meanwhile, the cost of IT-system operations occupies 60% of the total cost in a datacenter. So far, improving operations has been focused on automating operations with operation management middleware. However, designing operational procedures, including composing and modifying operational manuals and checklists, is also a big issue since it is time consuming. Supporting technology that improves the efficiency of IT-system operational procedure design has been studied by focusing on the commonly used procedural parts. Existing checklists were analyzed to extract the iterated patterns of the procedures in order to evaluate the technology. The evaluation results showed that common operations such as the ones on management middleware and the remote log-in were extracted as procedural parts. The number of checklist composition steps was reduced by 45% on average and that of the modification steps was reduced by 87% on average by using these procedural parts. A prototyping tool was developed, and the checklists of a sample system different from the analyzed one were implemented on the tool.

Archive | 2012

Application Programs and Systems

Kunio Uchiyama; Fumio Arakawa; Hironori Kasahara; Tohru Nojiri; Hideyuki Noda; Yasuhiro Tawara; Akio Idehara; Kenichi Iwata; Hiroaki Shikano

This chapter describes the evaluation of a heterogeneous multicore architecture consisting of a widely used advanced audio codec (AAC) [1] audio encoder implemented on a fabricated chip. The AAC encoder is supported for audio playback by various embedded systems. The processing scheme on the heterogeneous multicore architecture with support of hierarchical memories and data transfer units was newly investigated, and the execution time and power consumption of the encoding were measured.

Explore More