Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hironori Kasahara is active.

Publication


Featured researches published by Hironori Kasahara.


Autonomous Robots | 2002

Humanoid Robots in Waseda University—Hadaly-2 and WABIAN

S. Hashimoto; S. Narita; Hironori Kasahara; K. Shirai; T. Kobayashi; Atsuo Takanishi; Shigeki Sugano; J. Yamaguchi; H. Sawada; H. Takanobu; Koji Shibuya; T. Morita; T. Kurata; N. Onoe; K. Ouchi; T. Noguchi; Y. Niwa; S. Nagayama; H. Tabayashi; I. Matsui; M. Obata; H. Matsuzaki; A. Murasugi; S. Haruyama; T. Okada; Y. Hidaki; Y. Taguchi; K. Hoashi; E. Morikawa; Y. Iwano

This paper describes two humanoid robots developed in the Humanoid Robotics Institute, Waseda University. Hadaly-2 is intended to realize information interaction with humans by integrating environmental recognition with vision, conversation capability (voice recognition, voice synthesis), and gesture behaviors. It also possesses physical interaction functions for direct contact with humans and behaviors that are gentle and safe for humans. WABIAN is a robot with a complete human configuration that is capable of walking on two legs and carrying things as with humans. Furthermore, it has functions for information interactions suite for uses at home.


international solid-state circuits conference | 2008

An 8640 MIPS SoC with Independent Power-Off Control of 8 CPUs and 8 RAMs by An Automatic Parallelizing Compiler

Masayuki Ito; Toshihiro Hattori; Yutaka Yoshida; Kiyoshi Hayase; Tomoichi Hayashi; Osamu Nishii; Yoshihiko Yasu; Atsushi Hasegawa; Masashi Takada; Hiroyuki Mizuno; Kunio Uchiyama; Toshihiko Odaka; Jun Shirako; Masayoshi Mase; Keiji Kimura; Hironori Kasahara

Power efficient SoC design for embedded applications requires several independent power-domains where the power of unused blocks can be turned off. An SoC for mobile phones defines 23 hierarchical power domains but most of the power domains are assigned for peripheral IPs that mainly use low-leakage high-Vt transistors. Since high-performance multiprocessor SoCs use leaky low-Vt transistors for CPU sections, leakage power savings of these CPU sections is a primary objective. We develop an SoC with 8 processor cores and 8 user RAMs (1 per core) targeted for power-efficient high-performance embedded applications. We assign these 16 blocks to separate power domains so that they can be independently be powered off. A resume mode is also introduced where the power of the CPU is off and the user RAM is on for fast resume operation. An automatic parallelizing compiler schedules tasks for each CPU core and also performs power management for each CPU core. With the help of this compiler, each processor core can operate at a different frequency or even dynamically stop the clock to maintain processing performance while reducing average operating power consumption. The compiler also executes power-off control of unnecessary CPU cores.


languages and compilers for parallel computing | 2005

Compiler control power saving scheme for multi core processors

Jun Shirako; Naoto Oshiyama; Yasutaka Wada; Hiroaki Shikano; Keiji Kimura; Hironori Kasahara

With the increase of transistors integrated onto a chip, multi core processor architectures have attracted much attention to achieve high effective performance, shorten development period and reduce the power consumption. To this end, the compiler for a multi core processor is expected not only to parallelize program effectively, but also to control the voltage and clock frequency of processors and storages carefully inside an application program. This paper proposes a compilation scheme for reduction of power consumption under the multigrain parallel processing environment that controls Voltage/Frequency and power supply of each processor core on a chip. In the evaluation, the OSCAR compiler with the proposed scheme achieves 60.7 percent energy savings for SPEC CFP95 applu without performance degradation on 4 processors, and 45.4 percent energy savings for SPEC CFP95 tomcatv with real-time deadline constraint on 4 processors, and 46.5 percent energy savings for SPEC CFP95 swim with the deadline constraint on 4 processors.


IFAC Proceedings Volumes | 1987

Parallel Processing of Robot Motion Simulation

Hironori Kasahara; H. Fujii; Masahiko Iwata

Abstract A parallel processing scheme for robot motion simulation is described. It allows us to simulate the dynamical behaviours of arbitrarily-shaped robot-arms in the minimum processing time on multiprocessor systems with any number of parallel processors. This advantageous feature comes from the use of two optimal multiprocessor scheduling algorithms, DF/IHS and CP/MISF, developed by the authors Use is made of Walker and Orins method on the basis of the Newton-Euler formulae for the robot dynamics simulation which involves the computation of the robot motion (joint displacement, velocity and acceleration) for a given torque or force applied to each joint. The practicality and usefulness of the proposed parallel processing scheme are demonstrated on a prototype multimicroprocessor robot motion simulator. This paper is the first contribution which reports a success of efficient parallel processing of robot dynamics simulation on an actual multiprocessor system


languages and compilers for parallel computing | 1991

A Multi-Grain Parallelizing Compilation Scheme for OSCAR (Optimally Scheduled Advanced Multiprocessor)

Hironori Kasahara; Hiroki Honda; A. Mogi; A. Ogura; K. Fujiwara; Seinosuke Narita

This paper proposes a multi-grain parallelizing compilation scheme for Fortran programs. The scheme hierarchically exploits parallelism among coarse grain tasks, such as, loops, subroutines or basic blocks, among medium grain tasks like loop iterations and among near fine grain tasks like statements. Parallelism among the coarse grain tasks called the macrotasks is exploited by carefully analyzing control dependences and data dependences. The macrotasks are dynamically assigned to processor clusters to cope with run-time uncertainties, such as, conditional branches among the macrotasks and variation of execution time of each macrotask. The parallel processing of macrotasks is called the macro-dataflow computation. A macrotask composed of a Do-all loop, which is assigned onto a processor cluster, is processed in the medium grain in parallel by processors inside the processor cluster. A macrotask composed of a sequential loop or a basic block is processed on a processor cluster in the near fine grain by using static scheduling. A macrotask composed of subroutine or a large sequential loop is processed by hierarchically applying macro-dataflow computation inside a processor cluster. Performance of the proposed scheme is evaluated on a multiprocessor system named OSCAR. The evaluation shows that the multi-grain parallel processing effectively exploits parallelism from Fortran programs.


job scheduling strategies for parallel processing | 1998

Job Scheduling Scheme for Pure Space Sharing Among Rigid Jobs

Kento Aida; Hironori Kasahara; Seinosuke Narita

This paper evaluates the performance of job scheduling schemes for pure space sharing among rigid jobs. Conventional job scheduling schemes for the pure space sharing among rigid jobs have been achieved by First Come First Served (FCFS). However, FCFS has a drawback such that it can not utilize processors efficiently. This paper evaluates the performance of job scheduling schemes that are proposed to alleviate the drawback of FCFS by simulation, performance analysis and experiments on a real multiprocessor system. The results showed that Fit Processors First Served (FPFS), which searches the job queue and positively dispatches jobs that fit idle processors, was more effective and more practical than others.


international solid-state circuits conference | 2007

A 4320MIPS Four-Processor Core SMP/AMP with Individually Managed Clock Frequency for Low Power Consumption

Yutaka Yoshida; Tatsuya Kamei; Kiyoshi Hayase; Shinichi Shibahara; Osamu Nishii; Toshihiro Hattori; Atsushi Hasegawa; Masashi Takada; Naohiko Irie; Kunio Uchiyama; Toshihiko Odaka; Kiwamu Takada; Keiji Kimura; Hironori Kasahara

A 4320MIPS four-core SoC that supports both SMP and AMP for embedded applications is designed in 90nm CMOS. Each processor-core can be operated with a different frequency dynamically including clock stop, while keeping data cache coherency, to maintain maximum processing performance and to reduce average operating power. The 97.6mm2 die achieves a floating-point performance of 16.8GFLOPS


international solid-state circuits conference | 2010

A 45nm 37.3GOPS/W heterogeneous multi-core SoC

Yoichi Yuyama; Masayuki Ito; Yoshikazu Kiyoshige; Yusuke Nitta; Shigezumi Matsui; Osamu Nishii; Atsushi Hasegawa; Makoto Ishikawa; Tetsuya Yamada; Junichi Miyakoshi; Koichi Terada; Tohru Nojiri; Masashi Satoh; Hiroyuki Mizuno; Kunio Uchiyama; Yasutaka Wada; Keiji Kimura; Hironori Kasahara; Hideo Maejima

We develop a heterogeneous multi-core SoC for applications, such as digital TV systems with IP networks (IP-TV) including image recognition and database search. Figure 5.3.1 shows the chip features. This SoC is capable of decoding 1080i audio/video data using a part of SoC (one general-purpose CPU core, video processing unit called VPU5 and sound processing unit called SPU) [1]. Four dynamically reconfigurable processors called FE [2] are integrated and have a total theoretical performance of 41.5GOPS and power consumption of 0.76W. Two 1024-way matrix-processors called MX-2 [3] are integrated and have a total theoretical performance of 36.9GOPS and power consumption of 1.10W. Overall, the performance per watt of our SoC is 37.3GOPS/W at 1.15V, the highest among comparable processors [4–6] excluding special-purpose codecs. The operation granularity of the CPU, FE and MX-2 are 32bit, 16bit, and 4bit respectively, and thus we can assign the appropriate processor for each task in an effective manner. A heterogeneous multi-core approach is one of the most promising approaches to attain high performance with low frequency, or low power, for consumer electronics application and scientific applications, compared to homogeneous multi-core SoCs [4]. For example, for image-recognition application in the IP-TV system, the FEs are assigned to calculate optical flow operation [7] of VGA (640×480) size video data at 15fps, which requires 0.62GOPS. The MX-2s are used for face detection and calculation of the feature quantity of the VGA video data at 15fps, which requires 30.6GOPS. In addition, general-purpose CPU cores are used for database search using the results of the above operations, which requires further enhancement of CPU. The automatic parallelization compilers analyze parallelism of the data flow, generate coarse grain tasks, schedule tasks to minimize execution time considering data transfer overhead for general-purpose CPU and FE.


conference on high performance computing (supercomputing) | 1990

Parallel processing of near fine grain tasks using static scheduling on OSCAR (optimally scheduled advanced multiprocessor)

Hironori Kasahara; Hiroki Honda; Seinosuke Narita

The authors propose a compilation scheme for parallel processing near fine-grain tasks, each of which consists of several instructions or a statement, on a multiprocessor system called OSCAR. The scheme allows one to minimize synchronization and data transfer overheads and to optimally use registers of each processor by employing a static scheduling algorithm considering data transfer. This scheme can effectively be combined with macro-dataflow computation and with making the loop concurrent. A compiler using the proposed scheme has been implemented on OSCAR, which has been designed to take full advantage of the static scheduling. A performance evaluation of the scheme on OSCAR is also described.<<ETX>>


languages and compilers for parallel computing | 2002

Hierarchical parallelism control for multigrain parallel processing

Motoki Obata; Jun Shirako; Hiroki Kaminaga; Kazuhisa Ishizaka; Hironori Kasahara

To improve effective performance and usability of shared memory multiprocessor systems, a multi-grain compilation scheme, which hierarchically exploits coarse grain parallelism among loops, subroutines and basic blocks, conventional loop parallelism and near fine grain parallelism among statements inside a basic block, is important. In order to efficiently use hierarchical parallelism of each nest level, or layer, in multigrain parallel processing, it is required to determine how many processors or groups of processors should be assigned to each layer, according to the parallelism of the layer. This paper proposes an automatic hierarchical parallelism control scheme to assign suitable number of processors to each layer so that the parallelism of each hierarchy can be used efficiently. Performance of the proposed scheme is evaluated on IBM RS6000 SMP server with 8 processors using 8 programs of SPEC95FP.

Collaboration


Dive into the Hironori Kasahara's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge