Akihiro Ida | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Akihiro Ida is active.

Explore More

Publication

Featured researches published by Akihiro Ida.

Journal of Information Processing | 2014

Parallel Hierarchical Matrices with Adaptive Cross Approximation on Symmetric Multiprocessing Clusters

Akihiro Ida; Takeshi Iwashita; Takeshi Mifune; Yasuhito Takahashi

We discuss a scheme for hierarchical matrices with adaptive cross approximation on symmetric multiprocessing clusters. We propose a set of parallel algorithms that are applicable to hierarchical matrices. The proposed algorithms are implemented using the flat-MPI and hybrid MPI+OpenMP programming models. The performance of these implementations is evaluated using an electric field analysis computed on two symmetric multiprocessing cluster systems. Although the flat-MPI version gives better parallel scalability when constructing hierarchical matrices, the speed-up reaches a limit in the hierarchical matrix-vector multiplication. We succeeded in developing a hybrid MPI+OpenMP version to improve the parallel scalability. In numerical experiments, the hybrid version exhibits a better parallel speed-up for the hierarchical matrix-vector multiplication up to 256 cores.

international conference on conceptual structures | 2017

Software Framework for Parallel BEM Analyses with H-matrices Using MPI and OpenMP

Takeshi Iwashita; Akihiro Ida; Takeshi Mifune; Yasuhito Takahashi

We developed a software framework for boundary element analyses. The software supports a hybrid parallel programming model and is equipped with a hierarchical matrix (H-matrix) library to accelerate the BEM analysis.

IEEE Transactions on Magnetics | 2016

Variable Preconditioning of Krylov Subspace Methods for Hierarchical Matrices With Adaptive Cross Approximation

Akihiro Ida; Takeshi Iwashita; Takeshi Mifune; Yasuhito Takahashi

This paper discusses Krylov subspace methods to solve a linear system whose coefficient matrix is represented by a hierarchical matrix. We propose a preconditioning technique using a part of the original hierarchical matrix to accelerate the convergence of the Krylov subspace methods. The proposed preconditioning technique is based on the assumption that the submatrices on the original hierarchical matrix are approximated using the adaptive cross approximation or variants thereof. The performance of Krylov subspace methods with the proposed preconditioning technique is examined through numerical experiments on an electrostatic field analysis.

Journal of Information Processing | 2015

Improvement of Hierarchical Matrices with Adaptive Cross Approximation for Large-scale Simulation

Akihiro Ida; Takeshi Iwashita; Makiko Ohtani; Kazuro Hirahara

We propose an improved method for hierarchical-matrices (H-matrices) using adaptive cross approximation (ACA) as the low-rank approximation. The improvement consists of a kind of normalization and a new stopping criterion for the ACA. By using the proposed method, we can avoid the trouble that ranks of approximated matrices increase rapidly as the matrix size increases when the conventional H-matrices with ACA are employed to an integral equation whose kernel function has high-order singularities. In particular, application of the proposed method enables us to perform large-scale simulations such that the conventional H-matrices with ACA fail to construct the low-rank approximation. Applicability of the proposed method is confirmed through numerical experiments on an earthquake cycle simulation.

international conference on computational science | 2018

Design of Parallel BEM Analyses Framework for SIMD Processors.

Tetsuya Hoshino; Akihiro Ida; Toshihiro Hanawa; Kengo Nakajima

Parallel Boundary Element Method (BEM) analyses are typically conducted using a purpose-built software framework called BEM-BB. This framework requires a user-defined function program that calculates the i-th row and the j-th column of the coefficient matrix arising from the convolution integral term in the fundamental BEM equation. Owing to this feature, the framework can encapsulate MPI and OpenMP hybrid parallelization with \(\mathcal {H}\)-matrix approximation. Therefore, users can focus on implementing a fundamental solution or a Green’s function, which is the most important element in BEM and depends on the targeted physical phenomenon, as a user-defined function. However, the framework does not consider single instruction multiple data (SIMD) vectorization, which is important for high-performance computing and is supported by the majority of existing processors. Performing SIMD vectorization of a user-defined function is difficult because SIMD exploits instruction-level parallelization and is closely associated with the user-defined function. In this paper, a conceptual framework for enhancing SIMD vectorization is proposed. The proposed framework is evaluated using two BEM problems, namely, static electric field analysis with a perfect conductor and static electric field analysis with a dielectric, on Intel Broadwell (BDW) processor and Intel Xeon Phi Knights Landing (KNL) processor. It offers good vectorization performance with limited SIMD knowledge, as can be verified from the numerical results obtained herein. Specifically, in perfect conductor analyses conducted using the \(\mathcal {H}\)-matrix, the framework achieved performance improvements of 2.22x and 4.34x compared to the original BEM-BB framework for the BDW processor and KNL, respectively.

ieee international conference on high performance computing data and analytics | 2018

Parallel Hierarchical Matrices with Block Low-rank Representation on Distributed Memory Computer Systems

Akihiro Ida; Hiroshi Nakashima; Masatoshi Kawai

Any hierarchical matrix (H-matrix) can be transformed to an H-matrix with block low-rank representation (BLR). Although matrix arithmetic with BLR is easier than that with the normal H-matrix, memory usage is increased compared to the normal H-matrix O(N log N). Therefore, BLR has been utilized for complex arithmetic functions of relatively small matrices on a CPU node. In this study, we discuss the efficiency of H-matrices with BLR in simple arithmetic functions, such as H-matrix generation and H-matrix-vector multiplication, in large-scale problems on distributed memory computer systems. We demonstrate how the BLR block size should be defined in such problems and confirm that the complexity of the memory usage of H-matrices with BLR is O(N1.5) when using the appropriate block size. We propose a set of parallel algorithms for H-matrices with BLR. In numerical experiments using electric field analyses, the speed-up of the execution time for the simple arithmetic functions in H-matrices with BLR reaches about 10,000 MPI processes. We confirm that even for simple H-matrix arithmetic, the BLR version is significantly faster than the normal H-matrix version, if a large number of CPU cores are used.

Lecture Notes in Computer Science | 2018

Optimization of Hierarchical matrix computation on GPU

Satoshi Ohshima; Ichitaro Yamazaki; Akihiro Ida; Rio Yokota

The demand for dense matrix computation in large scale and complex simulations is increasing; however, the memory capacity of current computer system is insufficient for such simulations. Hierarchical matrix method (\(\mathcal {H}\)-matrices) is attracting attention as a computational method that can reduce the memory requirements of dense matrix computations. However, the computation of \(\mathcal {H}\)-matrices is more complex than that of dense and sparse matrices; thus, accelerating the \(\mathcal {H}\)-matrices is required. We focus on \(\mathcal {H}\)-matrix - vector multiplication (HMVM) on a single NVIDIA Tesla P100 GPU. We implement five GPU kernels and compare execution times among various processors (the Broadwell-EP, Skylake-SP, and Knights Landing) by OpenMP. The results show that, although an HMVM kernel can compute many small GEMV kernels, merging such kernels to a single GPU kernel was the most effective implementation. Moreover, the performance of BATCHED BLAS in the MAGMA library was comparable to that of the manually tuned GPU kernel.

ieee conference on electromagnetic field computation | 2016

Software framework for parallel BEM analyses with H-matrices

Takeshi Iwashita; Akihiro Ida; Takeshi Mifune; Yasuhito Takahashi

trust, security and privacy in computing and communications | 2015

A New Fill-in Strategy for IC Factorization Preconditioning Considering SIMD Instructions

Takeshi Iwashita; Naokazu Takemura; Akihiro Ida; Hiroshi Nakashima

Most of current processors are equipped with single instruction multiple data (SIMD) instructions that are used to increase the performance of application programs. In this paper, we analyze the effective use of SIMD instructions in the Incomplete Cholesky (IC) preconditioned Conjugate Gradient (CG) solver, which we employ in a variety of simulations. A new fill-in strategy in the IC factorization is proposed for the SIMD vectorization of the preconditioning step and to increase the convergence rate. Our numerical results confirm that the proposed method has better solver performance than the conventional IC(0)-CG method.

siam conference on parallel processing for scientific computing | 2018