Fumiyoshi Shoji | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fumiyoshi Shoji is active.

Explore More

Publication

Featured researches published by Fumiyoshi Shoji.

international symposium on low power electronics and design | 2011

The K computer : Japanese next-generation supercomputer development project

Mitsuo Yokokawa; Fumiyoshi Shoji; Atsuya Uno; Motoyoshi Kurokawa; Tadashi Watanabe

The K computer is a distributed memory supercomputer system consisting of more than 80,000 compute nodes which is being developed by RIKEN as a Japanese national project. Its performance is aimed at achieving 10 peta-flops sustained in the LINPACK benchmark. The system is under installation and adjustment. The whole system will be operational in 2012.

ieee international conference on high performance computing data and analytics | 2011

First-principles calculations of electron states of a silicon nanowire with 100,000 atoms on the K computer

Yukihiro Hasegawa; Jun-Ichi Iwata; Miwako Tsuji; Daisuke Takahashi; Atsushi Oshiyama; Kazuo Minami; Taisuke Boku; Fumiyoshi Shoji; Atsuya Uno; Motoyoshi Kurokawa; Hikaru Inoue; Ikuo Miyoshi; Mitsuo Yokokawa

Real space DFT (RSDFT) is a simulation technique most suitable for massively-parallel architectures to perform first-principles electronic-structure calculations based on density functional theory. We here report unprecedented simulations on the electron states of silicon nanowires with up to 107,292 atoms carried out during the initial performance evaluation phase of the K computer being developed at RIKEN. The RSDFT code has been parallelized and optimized so as to make effective use of the various capabilities of the K computer. Simulation results for the self-consistent electron states of a silicon nanowire with 10,000 atoms were obtained in a run lasting about 24 hours and using 6,144 cores of the K computer. A 3.08 peta-flops sustained performance was measured for one iteration of the SCF calculation in a 107,292-atom Si nanowire calculation using 442,368 cores, which is 43.63% of the peak performance of 7.07 peta-flops.

international solid-state circuits conference | 2012

K computer: 8.162 PetaFLOPS massively parallel scalar supercomputer built with over 548k cores

Hiroyuki Miyazaki; Yoshihiro Kusano; Hiroshi Okano; Tatsumi Nakada; Ken Seki; Toshiyuki Shimizu; Naoki Shinjo; Fumiyoshi Shoji; Atsuya Uno; Motoyoshi Kurokawa

Many high-performance CPUs employ a multicore architecture with a moderate clock frequency and wide instruction issue, including SIMD extensions, to achieve high performance while retaining a practical power consumption. As demand for supercomputer performance grows faster than the rate that improvements are made to CPU performance, the total number of cores of high-end supercomputers has increased tremendously. Efficient handling of large numbers of cores is a key aspect in the design of supercomputers. Building a supercomputer with lower power consumption and significant reliability is also important from the viewpoints of cost and availability.

Computer Science - Research and Development | 2013

The design of ultra scalable MPI collective communication on the K computer

Tomoya Adachi; Naoyuki Shida; Kenichi Miura; Shinji Sumimoto; Atsuya Uno; Motoyoshi Kurokawa; Fumiyoshi Shoji; Mitsuo Yokokawa

This paper proposes the design of ultra scalable MPI collective communication for the K computer, which consists of 82,944 computing nodes and is the world’s first system over 10 PFLOPS. The nodes are connected by a Tofu interconnect that introduces six dimensional mesh/torus topology. Existing MPI libraries, however, perform poorly on such a direct network system since they assume typical cluster environments. Thus, we design collective algorithms optimized for the K computer.On the design of the algorithms, we place importance on collision-freeness for long messages and low latency for short messages. The long-message algorithms use multiple RDMA network interfaces and consist of neighbor communication in order to gain high bandwidth and avoid message collisions. On the other hand, the short-message algorithms are designed to reduce software overhead, which comes from the number of relaying nodes. The evaluation results on up to 55,296 nodes of the K computer show the new implementation outperforms the existing one for long messages by a factor of 4 to 11 times. It also shows the short-message algorithms complement the long-message ones.

international conference on conceptual structures | 2014

The K computer Operations: Experiences and Statistics☆

Keiji Yamamoto; Atsuya Uno; Hitoshi Murai; Toshiyuki Tsukamoto; Fumiyoshi Shoji; Shuji Matsui; Ryuichi Sekizawa; Fumichika Sueyasu; Hiroshi Uchiyama; Mitsuo Okamoto; Nobuo Ohgushi; Katsutoshi Takashina; Daisuke Wakabayashi; Yuki Taguchi; Mitsuo Yokokawa

Abstract The K computer, released on September 29, 2012, is a large-scale parallel supercomputer system consisting of 82,944 compute nodes. We have been able to resolve a significant number of operation issues since its release. Some system software components have been fixed and improved to obtain higher stability and utilization. We achieved 94% service availability because of a low hardware failure rate and approximately 80% node utilization by careful adjustment of operation parameters. We found that the K computer is an extremely stable and high utilization system.

Journal of Synchrotron Radiation | 2013

High‐speed classification of coherent X‐ray diffraction patterns on the K computer for high‐resolution single biomolecule imaging

Atsushi Tokuhisa; Junya Arai; Yasumasa Joti; Yoshiyuki Ohno; Toyohisa Kameyama; Keiji Yamamoto; Masayuki Hatanaka; Balazs Gerofi; Akio Shimada; Motoyoshi Kurokawa; Fumiyoshi Shoji; Kensuke Okada; Takashi Sugimoto; Mitsuhiro Yamaga; Ryotaro Tanaka; Mitsuo Yokokawa; Atsushi Hori; Yutaka Ishikawa; Takaki Hatsui; Nobuhiro Go

A code with an algorithm for high-speed classification of X-ray diffraction patterns has been developed. Results obtained for a set of 1 × 106 simulated diffraction patterns are also reported.

ieee international conference on high performance computing data and analytics | 2018

Improving Collective MPI-IO Using Topology-Aware Stepwise Data Aggregation with I/O Throttling

Yuichi Tsujita; Atsushi Hori; Toyohisa Kameyama; Atsuya Uno; Fumiyoshi Shoji; Yutaka Ishikawa

MPI-IO has been used in an internal I/O interface layer of HDF5 or PnetCDF, where collective MPI-IO plays a big role in parallel I/O to manage a huge scale of scientific data. However, existing collective MPI-IO optimization named two-phase I/O has not been tuned enough for recent supercomputers consisting of mesh/torus interconnects and a huge scale of parallel file systems due to lack of topology-awareness in data transfers and optimization for parallel file systems. In this paper, we propose I/O throttling and topology-aware stepwise data aggregation in two-phase I/O of ROMIO, which is a representative MPI-IO library, in order to improve collective MPI-IO performance even if we have multiple processes per compute node. Throttling I/O requests going to a target file system mitigates I/O request contention, and consequently I/O performance improvements are achieved in file access phase of two-phase I/O. Topology-aware aggregator layout with paying attention to multiple aggregators per compute node alleviates contention in data aggregation phase of two-phase I/O. In addition, stepwise data aggregation improves data aggregation performance. HPIO benchmark results on the K computer indicate that the proposed optimization has achieved up to about 73% and 39% improvements in write performance compared with the original implementation using 12,288 and 24,576 processes on 3,072 and 6,144 compute nodes, respectively.

ieee international conference on high performance computing data and analytics | 2018

A Study on Open Source Software for Large-Scale Data Visualization on SPARC64fx based HPC Systems

Jorji Nonaka; Motohiko Matsuda; Takashi Shimizu; Naohisa Sakamoto; Masahiro Fujita; Keiji Onishi; Eduardo C. Inacio; Shun Ito; Fumiyoshi Shoji; Kenji Ono

In this paper, we present a study on the available open-source software (OSS) for large-scale data visualization on the SPARC64fx based HPC systems, such as the K computer and also the Fujitsu PRIMEHPC FX family of supercomputers (FX10 and FX100), which are commonly available throughout Japan. It is widely known that these HPC systems have been generating a vast amount of simulation results in a wide range of science and engineering fields. However, there was no much information regarding the large-scale data visualization software and approaches in such HPC infrastructure. In this work, we focused on the visualization approaches where the HPC hardware resources are directly used for the visualization processing, which can be helpful to minimize the large data transfer issue for the visualization and analysis purposes. This study includes both OpenGL (Open Graphics Library) and non-OpenGL based visualization approaches, and also the availability of the GLSL (OpenGL Shading Language) handling functionalities. Although it is a short survey focusing only on the post-processing issue, we expect that this study can be useful and helpful for the current and future potential users of the SPARC64fx CPU based HPC systems, which are still in active use throughout Japan.

Archive | 2018

A Transfer Entropy Based Visual Analytics System for Identifying Causality of Critical Hardware Failures Case Study: CPU Failures in the K Computer

Kazuki Koiso; Naohisa Sakamoto; Jorji Nonaka; Fumiyoshi Shoji

Large-scale scientific computing facilities usually operate expensive HPC (High Performance Computing) systems, which have their computational and storage resources shared with the authorized users. On such shared resource systems, a continuous and stable operation is fundamental for providing the necessary hardware resources for the different user needs, including large-scale numerical simulations, which are the main targets of such large-scale facilities. For instance, the K computer installed at the R-CCS (RIKEN Center for Computational Science), in Kobe, Japan, enables the users to continuously run large jobs with tens of thousands of nodes (a maximum of 36,864 computational nodes) for up to 24 h, and a huge job by using the entire K computer system (82,944 computational nodes) for up to 8 h. Critical hardware failures can directly impact the affected job, and may also indirectly impact the scheduled subsequent jobs. To monitor the health condition of the K computer and its supporting facility, a large number of sensors has been providing a vast amount of measured data. Since it is almost impossible to analyze the entire data in real-time, these information has been stored as log data files for post-hoc analysis. In this work, we propose a visual analytics system which uses these big log data files to identify the possible causes of the critical hardware failures. We focused on the transfer entropy technique for quantifying the “causality” between the possible cause and the critical hardware failure. As a case study, we focused on the critical CPU failures, which required subsequent substitution, and utilized the log files corresponding to the measured temperatures of the cooling system such as air and water. We evaluated the usability of our proposed system, by conducting practical evaluations via a group of experts who directly works on the K computer system operation. The positive and negative feedbacks obtained from this evaluation will be considered for the future enhancements.

International Journal of Modeling, Simulation, and Scientific Computing | 2018

Data I/O management approach for the post-hoc visualization of big simulation data results

Jorji Nonaka; Eduardo C. Inacio; Kenji Ono; Mario A. R. Dantas; Yasuhiro Kawashima; Tomohiro Kawanabe; Fumiyoshi Shoji

Leading-edge supercomputers, such as the K computer, have generated a vast amount of simulation results, and most of these datasets were stored on the file system for the post-hoc analysis such as ...

Explore More