Naoyuki Onodera | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Naoyuki Onodera is active.

Explore More

Publication

Featured researches published by Naoyuki Onodera.

Mathematical Problems in Engineering | 2014

Direct numerical simulation and large eddy simulation on a turbulent wall-bounded flow using lattice Boltzmann method and multiple GPUs

Xian Wang; Yanqin Shangguan; Naoyuki Onodera; H. Kobayashi; Takayuki Aoki

Direct numerical simulation (DNS) and large eddy simulation (LES) were performed on the wall-bounded flow at using lattice Boltzmann method (LBM) and multiple GPUs (Graphic Processing Units). In the DNS, 8 K20M GPUs were adopted. The maximum number of meshes is , which results in the nondimensional mesh size of for the whole solution domain. It took 24 hours for GPU-LBM solver to simulate LBM steps. The aspect ratio of resolution domain was tested to obtain accurate results for DNS. As a result, both the mean velocity and turbulent variables, such as Reynolds stress and velocity fluctuations, perfectly agree with the results of Kim et al. (1987) when the aspect ratios in streamwise and spanwise directions are 8 and 2, respectively. As for the LES, the local grid refinement technique was tested and then used. Using grids and Smagorinsky constant , good results were obtained. The ability and validity of LBM on simulating turbulent flow were verified.

ieee international conference on high performance computing data and analytics | 2014

High-productivity framework on GPU-rich supercomputers for operational weather prediction code ASUCA

Takashi Shimokawabe; Takayuki Aoki; Naoyuki Onodera

The weather prediction code demands large computational performance to achieve fast and high-resolution simulations. Skillful programming techniques are required for obtaining good parallel efficiency on GPU supercomputers. Our framework-based weather prediction code ASUCA has achieved good scalability with hiding complicated implementation and optimizations required for distributed GPUs, contributing to increasing the maintainability, ASUCA is a next-generation high resolution meso-scale atmospheric model being developed by the Japan Meteorological Agency. Our framework automatically translates user-written stencil functions that update grid points and generates both GPU and CPU codes. User-written codes are parallelized by MPI with intra-node GPU peer-to-peer direct access. These codes can easily utilize optimizations such as overlapping technique to hide communication overhead by computation. Our simulations on the GPU-rich supercomputer TSUBAME 2.5 at the Tokyo Institute of Technology have demonstrated good strong and weak scalability achieving 209.6 TFlops in single precision for our largest model using 4,108 NVIDIA K20X GPUs.

international conference on conceptual structures | 2016

High-productivity Framework for Large-scale GPU/CPU Stencil Applications

Takashi Shimokawabe; Takayuki Aoki; Naoyuki Onodera

A high-productivity framework for multi-GPU and multi-CPU computation of stencil applications is proposed. Our framework is implemented in C++ and CUDA languages. It automatically translates user-written stencil functions that update a grid point and generates both GPU and CPU codes. The programmers write user code just in the C++ language, and can execute the translated user code on either multiple multicore CPUs or multiple GPUs with optimization. The user code can be executed on multiple GPUs with the auto-tuning mechanism and the overlapping method to hide communication cost by computation. It can be also executed on multiple CPUs with OpenMP. The compressible flow code on GPU exploiting the optimizations provided by the framework has achieved 2.7 times faster than the non-optimized version.

BEAMED ENERGY PROPULSION: Fourth International Symposium on Beamed Energy Propulsion | 2006

Experimental Study of Solar Pumped Laser for Magnesium‐Hydrogen Energy Cycle

Shigeaki Uchida; Takashi Yabe; Yuji Sato; Kunio Yoshida; Akio Ikesue; Tomomasa Ohkubo; Akito Mabuchi; Youichi Ogata; Kenji Nakagawa; Atsushi Ohyama; Naoyuki Onodera; Takahiro Ohishi; Yasuhiro Ohtaka; Yoshiyuki Yamada; Satoshi Ito

This paper describes the initial experiments on solar pumped laser which is intended to be used as an energy converter for a new reusable energy cycle. Research has been conducted with the use of devices and elements which are compatible with industrial production. Frasnel lens and Chromium ion co‐doped Nd:YAG ceramic laser media have been employed for the lasing demonstration of a solar pumped laser system for the first time. Basic properties of the system such as solar power concentration for lasing threshold and enhancements of pumping efficiency have been experimentally measured.

Asian Conference on Supercomputing Frontiers | 2018

Acceleration of Wind Simulation Using Locally Mesh-Refined Lattice Boltzmann Method on GPU-Rich Supercomputers.

Naoyuki Onodera; Yasuhiro Idomura

A real-time simulation of the environmental dynamics of radioactive substances is very important from the viewpoint of nuclear security. Since airflows in large cities are turbulent with Reynolds numbers of several million, large-scale CFD simulations are needed. We developed a CFD code based on the adaptive mesh-refined Lattice Boltzmann Method (AMR-LBM). AMR method arranges fine grids in a necessary region, so that we can realize a high-resolution analysis including a global simulation area. The code is developed on the GPU-rich supercomputer TSUBAME3.0 at the Tokyo Tech, and the GPU kernel functions are tuned to achieve high performance on the Pascal GPU architecture. The code is validated against a wind tunnel experiment which was released from the National Institute of Advanced Industrial Science and Technology in Japan Thanks to the AMR method, the total number of grid points is reduced to less than 10% compared to the fine uniform grid system. The performances of weak scaling from 1 nodes to 36 nodes are examined. The GPUs (NVIDIA TESLA P100) achieved more than 10 times higher node performance than that of CPUs (Broadwell).

international conference on cluster computing | 2017

A Stencil Framework to Realize Large-Scale Computations Beyond Device Memory Capacity on GPU Supercomputers

Takashi Shimokawabe; Toshio Endo; Naoyuki Onodera; Takayuki Aoki

Stencil-based applications such as CFD have succeeded in obtaining high performance on GPU supercomputers. The problem sizes of these applications are limited by the GPU device memory capacity, which is typically smaller than the host memory. On GPU supercomputers, a locality improvement technique using temporal blocking method with memory swapping between host and device enables large computation beyond the device memory capacity. However, because the loop management of temporal blocking with data movement across these memories increase programming difficulty, the applying this methodology to the real stencil applications demands substantially higher programming cost. Our high-productivity stencil framework automatically applies temporal blocking to boundary exchange required for stencil computation and supports automatic memory swapping provided by a MPI/CUDA wrapper library. The framework-based application for the airflow in an urban city maintains 80% performance even with the twice larger than the GPU memory capacity and have demonstrated good weak scalability on the TSUBAME 2.5 supercomputer.

VII European Congress on Computational Methods in Applied Sciences and Engineering | 2016

LARGE-SCALE FREE-SURFACE FLOW SIMULATION USING LATTICE BOLTZMANN METHOD ON MULTI-GPU CLUSTERS

Naoyuki Onodera; Kunihide Ohashi

Turbulent free-surface flows around ship strongly affect maneuverability and safety. In order to understand the details of the turbulent flow and surface deformation, it is necessary to carry out high-order accurate and large-scale CFD simulations. We have developed a CFD code based on LBM (Lattice Boltzmann Method) with a single-phase free-surface model. Since violent flows are turbulent with high Reynolds number, a LES (Large-Eddy Simulation) model has to be introduced to solve the LBM equation. The coherent-structure Smagorinsky model is a state-of-the-art sub-grid scale model. Since this model is able to determine the model constant locally, it is suitable for a large-scale calculation containing complicated solid bodies. Our code is written in CUDA and MPI. The GPU kernel function is tuned to achieve high performance on the TSUBAME 2.5 supercomputer at Tokyo Institute of Technology. We obtained good scalability in weak scaling test. Each GPU handles a domain of 192 × 192 × 192, and 27 components are defined at a grid by the D3Q27 model. The fairly high performance of 809 MLUPS(Mega lattice update per second) is achieved by using 1000 GPUs in single precision. By executing this high-performance computation, turbulent violent flow simulation with real ship data is performed, and details of turbulent flows and freesurface deformations will be simulated with much higher accuracy than ever before.

Journal of Computational Physics | 2011