Hidehiko Hasegawa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hidehiko Hasegawa is active.

Explore More

Publication

Featured researches published by Hidehiko Hasegawa.

international workshop on openmp | 2005

Performance evaluation of parallel sparse matrix-vector products on SGI Altix3700

Hisashi Kotakemori; Hidehiko Hasegawa; Tamito Kajiyama; Akira Nukada; Reiji Suda; Akira Nishida

The present paper discusses scalable implementations of sparse matrix-vector products, which are crucial for high performance solutions of large-scale linear equations, on a cc-NUMA machine SGI Altix3700. Three storage formats for sparse matrices are evaluated, and scalability is attained by implementations considering the page allocation mechanism of the NUMA machine. Influences of the cache/memory bus architectures on the optimum choice of the storage format are examined, and scalable converters between storage formats shown to facilitate exploitation of storage formats of higher performance.

ieee international conference on high performance computing data and analytics | 2005

Performance evaluation of a parallel iterative method library using OpenMP

Hisashi Kotakemori; Hidehiko Hasegawa; Akira Nishida

The present paper discusses scalable implementations of sparse matrix-vector products using OpenMP to execute the iterative method on the SGI Altix3700, the IBM eServer p5 595 and the Sun SunFire15K. Three storage formats (CRS, BSR and DIA) for sparse matrices are evaluated. The present implementation provides satisfactory scalabilities. In some cases, an optimal storage format with data conversion should be used. In addition, the influence of the cache/memory bus architectures on the optimum choice of the storage format is examined

parallel processing and applied mathematics | 2005

SILC: a flexible and environment-independent interface for matrix computation libraries

Tamito Kajiyama; Akira Nukada; Hidehiko Hasegawa; Reiji Suda; Akira Nishida

We propose a new framework, named Simple Interface for Library Collections (SILC), that gives users access to matrix computation libraries in a flexible and environment-independent manner. SILC achieves source-level independence between user programs and libraries by (1) separating a function call into data transfer and a request for computation, (2) requesting the computation by means of mathematical expressions in the form of text, and (3) using a separate memory space to carry out library functions independently of the user programs. Using SILC, users can easily access various libraries without any modification of the user programs. This paper describes the design and implementation of SILC based on a client-server architecture, and presents some experimental results on the performance of the implemented system in different computing environments.

Journal of Computational Science | 2012

Analysis of the GCR method with mixed precision arithmetic using QuPAT

Tsubasa Saito; Emiko Ishiwata; Hidehiko Hasegawa

Abstract To verify computation results of double precision arithmetic, a high precision arithmetic environment is needed. However, it is difficult to use high precision arithmetic in ordinary computing environments without any special hardware or libraries. Hence, we designed the quadruple precision arithmetic environment QuPAT on Scilab to satisfy the following requirements: (i) to enable programs to be written simply using quadruple precision arithmetic; (ii) to enable the use of both double and quadruple precision arithmetic at the same time; (iii) to be independent of any hardware and operating systems. To confirm the effectiveness of QuPAT, we applied the GCR method for ill-conditioned matrices and focused on the scalar parameters α and β in GCR, partially using DD arithmetic. We found that the use of DD arithmetic only for β leads to almost the same results as when DD arithmetic is used for all computations. We conclude that QuPAT is an excellent interactive tool for using double precision and DD arithmetic at the same time.

international conference on computational science and its applications | 2010

Development of quadruple precision arithmetic toolbox QuPAT on scilab

Tsubasa Saito; Emiko Ishiwata; Hidehiko Hasegawa

When floating point arithmetic is used in numerical computation, cancellation of significant digits, round-off errors and information loss cannot be avoided. In some cases it becomes necessary to use multiple precision arithmetic; however some operations of this arithmetic are difficult to implement within conventional computing environments. In this paper we consider implementation of a quadruple precision arithmetic environment QuPAT (Quadruple Precision Arithmetic Toolbox) using the interactive numerical software package Scilab as a toolbox. Based on Double-Double (DD) arithmetic, QuPAT uses only a combination of double precision arithmetic operations. QuPAT has three main characteristics: (1) the same operator is used for both double and quadruple precision arithmetic; (2) both double and quadruple precision arithmetic can be used at the same time, and also mixed precision arithmetic is available; (3) QuPAT is independent of which hardware and operating systems are used. Finally we show the effectiveness of QuPAT in the case of analyzing a convergence property of the GCR(m) method for a system of linear equations.

international conference on parallel processing | 2013

AVX Acceleration of DD Arithmetic Between a Sparse Matrix and Vector

Toshiaki Hishinuma; Akihiro Fujii; Teruo Tanaka; Hidehiko Hasegawa

High precision arithmetic can improve the convergence of Krylov subspace methods; however, it is very costly. One system of high precision arithmetic is double-double (DD) arithmetic, which uses more than 20 double precision operations for one DD operation. We accelerated DD arithmetic using AVX SIMD instructions. The performances of vector operations in 4 threads are 51–59 % of peak performance in a cache and bounded by the memory access speed out of the cache. For SpMV, we used a double precision sparse matrix A and DD vector x to reduce memory access and achieved performances of 17–41 % of peak performance using padding in execution. We also achieved performances that were 9–33 % of peak performance for a transposed SpMV. For these cases, the performances were not bounded by memory access.

international conference on parallel processing | 2013

Effectiveness of Sparse Data Structure for Double-Double and Quad-Double Arithmetics

Tsubasa Saito; Satoko Kikkawa; Emiko Ishiwata; Hidehiko Hasegawa

Double-double and Quad-double arithmetics are effective tools to reduce the round-off errors in floating-point arithmetic. However, the dense data structure for high-precision numbers in MuPAT/Scilab requires large amounts of memory and a great deal of the computation time. We implemented sparse data types ddsp and qdsp for double-double and quad-double numbers. We showed that sparse data structure for high-precision arithmetic is practically useful for solving a system of ill-conditioned linear equation to improve the convergence and obtain the accurate result in smaller computation time.

high performance computing for computational science (vector and parallel processing) | 2016

SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision

Toshiaki Hishinuma; Hidehiko Hasegawa; Teruo Tanaka

We accelerate a double-precision sparse matrix and DD vector multiplication (DD-SpMV) and its transposition and DD vector multiplication (DD-TSpMV) using SIMD AVX2. AVX2 requires changing the memory access pattern to allow four consecutive 64-bit elements to be read at once. In our previous research, DD-SpMV in CRS using AVX2 needed non-continuous memory load, processing for the remainder, and the summation of four elements in the AVX2 register. These factors degrade the performance of DD-SpMV. In this paper, we compare the storage formats of DD-SpMV and DD-TSpMV for AVX2 to eliminate the performance degradation factors in CRS. Our result indicates that BCRS4x1, whose block size fits the AVX2 register’s length, is effective for DD-SpMV and DD-TSpMV.

International Workshop on Eigenvalue Problems: Algorithms, Software and Applications in Petascale Computing | 2015

A Parallel Bisection and Inverse Iteration Solver for a Subset of Eigenpairs of Symmetric Band Matrices

Hiroyuki Ishigami; Hidehiko Hasegawa; Kinji Kimura; Yoshimasa Nakamura

The tridiagonalization and its back-transformation for computing eigenpairs of real symmetric dense matrices are known to be the bottleneck of the execution time in parallel processing owing to the communication cost and the number of floating-point operations. To overcome this problem, we focus on real symmetric band eigensolvers proposed by Gupta and Murata since their eigensolvers are composed of the bisection and inverse iteration algorithms and do not include neither the tridiagonalization of real symmetric band matrices nor its back-transformation. In this paper, the following parallel solver for computing a subset of eigenpairs of real symmetric band matrices is proposed on the basis of Murata’s eigensolver: the desired eigenvalues of the target band matrices are computed directly by using parallel Murata’s bisection algorithm. The corresponding eigenvectors are computed by using block inverse iteration algorithm with reorthogonalization, which can be parallelized with lower communication cost than the inverse iteration algorithm. Numerical experiments on shared-memory multi-core processors show that the proposed eigensolver is faster than the conventional solvers.

International Workshop on Eigenvalue Problems: Algorithms, Software and Applications in Petascale Computing | 2015

Comparison of Tridiagonalization Methods Using High-Precision Arithmetic with MuPAT

Ryoya Ino; Kohei Asami; Emiko Ishiwata; Hidehiko Hasegawa

In general, when computing the eigenvalues of symmetric matrices, a matrix is tridiagonalized using some orthogonal transformation. The Householder transformation, which is a tridiagonalization method, is accurate and stable for dense matrices, but is not applicable to sparse matrices because of the required memory space. The Lanczos and Arnoldi methods are also used for tridiagonalization and are applicable to sparse matrices, but these methods are sensitive to computational errors. In order to obtain a stable algorithm, it is necessary to apply numerous techniques to the original algorithm, or to simply use accurate arithmetic in the original algorithm. In floating-point arithmetic, computation errors are unavoidable, but can be reduced by using high-precision arithmetic, such as double-double (DD) arithmetic or quad-double (QD) arithmetic. In the present study, we compare double, double-double, and quad-double arithmetic for three tridiagonalization methods; the Householder method, the Lanczos method, and the Arnoldi method. To evaluate the robustness of these methods, we applied them to dense matrices that are appropriate for the Householder method. It was found that using high-precision arithmetic, the Arnoldi method can produce good tridiagonal matrices for some problems whereas the Lanczos method cannot.

Explore More