Kazuo Minami
Fujitsu
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kazuo Minami.
Physical Review B | 2005
Masashi Tachiki; Mikio Iizuka; Kazuo Minami; Syogo Tejima; Hisashi Nakamura
We present a mechanism for emission of electromagnetic terahertz waves by simulation. High
Journal of Chemical Theory and Computation | 2013
Yoshimichi Andoh; Noriyuki Yoshii; Kazushi Fujimoto; Keisuke Mizutani; Hidekazu Kojima; Atsushi Yamada; Susumu Okazaki; Kazutomo Kawaguchi; Hidemi Nagao; Kensuke Iwahashi; Fumiyasu Mizutani; Kazuo Minami; Shin-ichi Ichikawa; Hidemi Komatsu; Shigeru Ishizuki; Yasuhiro Takeda; Masao Fukushima
T_{c}
ieee international conference on high performance computing data and analytics | 2011
Yukihiro Hasegawa; Jun-Ichi Iwata; Miwako Tsuji; Daisuke Takahashi; Atsushi Oshiyama; Kazuo Minami; Taisuke Boku; Fumiyoshi Shoji; Atsuya Uno; Motoyoshi Kurokawa; Hikaru Inoue; Ikuo Miyoshi; Mitsuo Yokokawa
superconductors form naturally stacked Josephson junctions. When an external current and a magnetic field are applied to the sample, fluxon flow induces voltage. The voltage creates oscillating current through the Josephson effect and the current excites the Josephson plasma. The sample works as a cavity, and the input energy is stored in a form of standing wave of the Josephson plasma. A part of the energy is emitted as terahertz waves.
ieee international conference on high performance computing data and analytics | 2015
Tsuyoshi Ichimura; Kohei Fujita; Pher Errol Balde Quinay; Lalith Maddegedara; Muneo Hori; Seizo Tanaka; Yoshihisa Shizawa; Hiroshi Kobayashi; Kazuo Minami
Our new molecular dynamics (MD) simulation program, MODYLAS, is a general-purpose program appropriate for very large physical, chemical, and biological systems. It is equipped with most standard MD techniques. Long-range forces are evaluated rigorously by the fast multipole method (FMM) without using the fast Fourier transform (FFT). Several new methods have also been developed for extremely fine-grained parallelism of the MD calculation. The virtually buffering-free methods for communications and arithmetic operations, the minimal communication latency algorithm, and the parallel bucket-relay communication algorithm for the upper-level multipole moments in the FMM realize excellent scalability. The methods for blockwise arithmetic operations avoid data reload, attaining very small cache miss rates. Benchmark tests for MODYLAS using 65 536 nodes of the K-computer showed that the overall calculation time per MD step including communications is as short as about 5 ms for a 10 million-atom system; that is, 35 ns of simulation time can be computed per day. The program enables investigations of large-scale real systems such as viruses, liposomes, assemblies of proteins and micelles, and polymers.
ieee international conference on high performance computing data and analytics | 2014
Yukihiro Hasegawa; Jun-Ichi Iwata; Miwako Tsuji; Daisuke Takahashi; Atsushi Oshiyama; Kazuo Minami; Taisuke Boku; Hikaru Inoue; Yoshito Kitazawa; Ikuo Miyoshi; Mitsuo Yokokawa
Real space DFT (RSDFT) is a simulation technique most suitable for massively-parallel architectures to perform first-principles electronic-structure calculations based on density functional theory. We here report unprecedented simulations on the electron states of silicon nanowires with up to 107,292 atoms carried out during the initial performance evaluation phase of the K computer being developed at RIKEN. The RSDFT code has been parallelized and optimized so as to make effective use of the various capabilities of the K computer. Simulation results for the self-consistent electron states of a silicon nanowire with 10,000 atoms were obtained in a run lasting about 24 hours and using 6,144 cores of the K computer. A 3.08 peta-flops sustained performance was measured for one iteration of the SCF calculation in a 107,292-atom Si nanowire calculation using 442,368 cores, which is 43.63% of the peak performance of 7.07 peta-flops.
ieee international conference on high performance computing data and analytics | 2016
Kiyoshi Kumahata; Kazuo Minami; Naoya Maruyama
This paper presents a new heroic computing method for unstructured, low-order, finite-element, implicit nonlinear wave simulation: 1.97 PFLOPS (18.6% of peak) was attained on the full K computer when solving a 1.08T degrees-of-freedom (DOF) and 0.270T-element problem. This is 40.1 times more DOF and elements, a 2.68-fold improvement in peak performance, and 3.67 times faster in time-to-solution compared to the SC14 Gordon Bell finalists state-of-the-art simulation. The method scales up to the full K computer with 663,552 CPU cores with 96.6% sizeup efficiency, enabling solving of a 1.08T DOF problem in 29.7 s per time step. Using such heroic computing, we solved a practical problem involving an area 23.7 times larger than the state-of-the-art, and conducted a comprehensive earthquake simulation by combining earthquake wave propagation analysis and evacuation analysis. Application at such scale is a groundbreaking accomplishment and is expected to change the quality of earthquake disaster estimation and contribute to society.
Proceedings of the Platform for Advanced Scientific Computing Conference on | 2016
Hisashi Yashiro; Masaaki Terai; Ryuji Yoshida; Shin-ichi Iga; Kazuo Minami; Hirofumi Tomita
Silicon nanowires are potentially useful in next-generation field-effect transistors, and it is important to clarify the electron states of silicon nanowires to know the behavior of new devices. Computer simulations are promising tools for calculating electron states. Real-space density functional theory (RSDFT) code performs first-principles electronic structure calculations. To obtain higher performance, we applied various optimization techniques to the code: multi-level parallelization, load balance management, sub-mesh/torus allocation, and a message-passing interface library tuned for the K computer. We measured and evaluated the performance of the modified RSDFT code on the K computer. A 5.48 petaflops (PFLOPS) sustained performance was measured for an iteration of a self-consistent field calculation for a 107,292-atom Si nanowire simulation using 82,944 compute nodes, which is 51.67% of the K computer’s peak performance of 10.62 PFLOPS. This scale of simulation enables analysis of the behavior of a silicon nanowire with a diameter of 10–20 nm.
mining software repositories | 2015
Masatomo Hashimoto; Masaaki Terai; Toshiyuki Maeda; Kazuo Minami
The high-performance conjugate gradient (HPCG) is new benchmark software for supercomputers that provides a more realistic performance metric than existing benchmarks, such as the LINPACK benchmark. The HPCG measures the speed of solving symmetric sparse linear system equations using the conjugate gradient method preconditioned by a multigrid symmetric Gauss–Seidel smoother. The combination of a sparse linear system and a preconditioned conjugate gradient method is widely used in many scientific and engineering computer applications. This study introduces a tuning method for the K computer. According to weak-scaling measurements on the K computer, it has good parallel scalability. Therefore, our tuning strategy focuses on single CPU performance rather than parallel performance. Single CPU performance strongly depends on memory throughput and multicore utilization. Therefore, we attempt to improve memory/cache access performance and multithreading efficiency. As a result, a HPCG score obtained with the K computer achieved second place at SC’14.
international conference on conceptual structures | 2013
Kiyoshi Kumahata; Shunsuke Inouea; Kazuo Minami
We summarize the optimization and performance evaluation of the Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on two different types of supercomputers: the K computer and TSUBAME2.5. First, we evaluated and improved several kernels extracted from the model code on the K computer. We did not significantly change the loop and data ordering for sufficient usage of the features of the K computer, such as the hardware-aided thread barrier mechanism and the relatively high bandwidth of the memory, i.e., a 0.5 Byte/FLOP ratio. Loop optimizations and code cleaning for a reduction in memory transfer contributed to a speed-up of the model execution time. The sustained performance ratio of the main loop of the NICAM reached 0.87 PFLOPS with 81,920 nodes on the K computer. For GPU-based calculations, we applied OpenACC to the dynamical core of NICAM. The performance and scalability were evaluated using the TSUBAME2.5 supercomputer. We achieved good performance results, which showed efficient use of the memory throughput performance of the GPU as well as good weak scalability. A dry dynamical core experiment was carried out using 2560 GPUs, which achieved 60 TFLOPS of sustained performance.
international conference on parallel processing | 2012
Masaaki Terai; Hitoshi Murai; Kazuo Minami; Mitsuo Yokokawa; Eiji Tomiyama
To improve performance of large-scale scientific applications, scientists or tuning experts make various empirical attempts to change compiler options, program parameters or even the syntactic structure of programs. Those attempts followed by performance evaluation are repeated until satisfactory results are obtained. The task of performance tuning requires a great deal of time and effort. On account of combinatorial explosion of possible attempts, scientists/tuning experts have a tendency to make decisions on what to be explored just based on their intuition or good sense of tuning. We advocate evidence-based performance tuning (EBT) that facilitates the use of database of facts extracted from tuning histories of applications to guide the exploration of the search space. However, in general, performance tuning is conducted as transient tasks without version control systems. Tuning histories may lack explicit facts about what kind of program transformation contributed to the better performance or even about the chronological order of the source code snapshots. For reconstructing the missing information, we employ a state-of-the-art fine-grained change pattern identification tool for inferring applied transformation patterns only from an unordered set of source code snapshots. The extracted facts are intended to be stored and queried for further data mining. This paper reports on experiments of tuning pattern identification followed by predictive model construction conducted for a few scientific applications tuned for the K supercomputer.