Munehiro Doi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Munehiro Doi is active.

Explore More

Publication

Featured researches published by Munehiro Doi.

acm multimedia | 2007

Multilevel parallelization on the cell/B.E. for a motion JPEG 2000 encoding server

Hidemasa Muta; Munehiro Doi; Hiroki Nakano; Yumi Mori

The Cell Broadband Engine (Cell/B.E.) is a novel multi-core microprocessor designed to provide high-performance processing capabilities for a wide range of applications. In this paper, we describe the worlds first JPEG 2000 and Motion JPEG 2000 encoder on Cell/B.E. Novel parallelization techniques for a Motion JPEG 2000 encoder that unleash the performance of the Cell/B.E. are proposed. Our Motion JPEG 2000 encoder consists of multiple video frame encoding servers on a cluster system for high-level parallelization. Each video frame encoding server runs on a heterogeneous multi-core Cell/B.E. processor, and utilizes its 8 Synergistic Processor Elements (SPEs) for low-level parallelization of the time consuming parts of the JPEG 2000 encoding process, such as the wavelet transform, the bit modeling, and the arithmetic coding. The effectiveness of high-level parallelization by the cluster system is also described, not only for the parallel encoding, but also for scalable performance improvement for real-time encoding and future enhancements. We developed all of the code from scratch for effective multilevel parallelization. Our results show that the Cell/B.E. is extremely efficient for this workload compared with commercially available processors, and thus we conclude that the Cell/B.E. is quite suitable for encoding next generation large pixel formats, such as 4K/2K-Digital Cinema.

international conference on multimedia and expo | 2006

Cell-Broadband-Engine-Based Realtime Wavelet Decomposition for HDTV Video Images and Beyond

Akihiro Asahara; Munehiro Doi; Yumi Mori; Hiroki Nishiyama; Hiroki Nakano

The cell broadband engine (CBE) is a novel multi-core microprocessor designed to provide compact and high-performance processing capabilities for a wide range of applications. Real-time image processing applications with parallelism for large amounts of data are good examples to demonstrate the unique capabilities of the CBE. In this paper, we describe the evaluation of the performance for image processing using wavelet transforms on CBE. Our results show that the CBE is extremely efficient in this processing compared with commercially available processors, and thus, we conclude that the CBE is quite suitable for next generation large pixel formats, such as 4K/2K-digital cinema

international parallel and distributed processing symposium | 2012

Automatic Resource Scheduling with Latency Hiding for Parallel Stencil Applications on GPGPU Clusters

Kumiko Maeda; Masana Murase; Munehiro Doi; Hideaki Komatsu; Shigeho Noda; Ryutaro Himeno

Overlapping computations and communication is a key to accelerating stencil applications on parallel computers, especially for GPU clusters. However, such programming is a time-consuming part of the stencil application development. To address this problem, we developed an automatic code generation tool to produce a parallel stencil application with latency hiding automatically from its dataflow model. With this tool, users visually construct the workflows of stencil applications in a dataflow programming model. Our dataflow compiler determines a data decomposition policy for each application, and generates source code that overlaps the stencil computations and communication (MPI and PCIe). We demonstrate two types of overlapping models, a CPU-GPU hybrid execution model and a GPU-only model. We use a CFD benchmark computing 19-point 3D stencils to evaluate our scheduling performance, which results in 1.45 TFLOPS in single precision on a cluster with 64 Tesla C1060 GPUs.

computing frontiers | 2011

A parallel programming framework orchestrating multiple languages and architectures

Masana Murase; Hideaki Komatsu; Kumiko Maeda; Shigeho Noda; Munehiro Doi; Ryutaro Himeno

This paper presents a novel parallel programming framework that orchestrates multiple languages such as C, C++, and Fortran and multiple computational architectures such as x86, POWER, and NVIDIAs Fermi to enhance productivity of parallel stencil applications while supporting high performance. Unlike traditional parallel programming frameworks, our framework provides three unique features: (1) simple meta-level, visual programming to construct workflows of components written in traditional programming languages, (2) optimal component parallelization and resource scheduling with the stencil communication pattern resolution, and (3) automatic network code generation including MPI, sockets, memory copies, and pointer passing. We prototyped incompressible computational fluid dynamics applications and demonstrate the effectiveness of our approach by evaluating our framework.

Archive | 2007