Ichitaro Yamazaki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ichitaro Yamazaki is active.

Explore More

Publication

Featured researches published by Ichitaro Yamazaki.

research in computational molecular biology | 2008

CompostBin: a DNA composition-based algorithm for binning environmental shotgun reads

Sourav Chatterji; Ichitaro Yamazaki; Zhaojun Bai; Jonathan A. Eisen

A major hindrance to studies of microbial diversity has been that the vast majority of microbes cannot be cultured in the laboratory and thus are not amenable to traditional methods of characterization. Environmental shotgun sequencing (ESS) overcomes this hurdle by sequencing the DNA from the organisms present in a microbial community. The interpretation of this metagenomic data can be greatly facilitated by associating every sequence read with its source organism. We report the development of CompostBin, a DNA composition-based algorithm for analyzing metagenomic sequence reads and distributing them into taxon-specific bins. Unlike previous methods that seek to bin assembled contigs and often require training on known reference genomes, CompostBin has the ability to accurately bin raw sequence reads without need for assembly or training. CompostBin uses a novel weighted PCA algorithm to project the high dimensional DNA composition data into an informative lower-dimensional space, and then uses the normalized cut clustering algorithm on this filtered data set to classify sequences into taxon-specific bins. We demonstrate the algorithms accuracy on a variety of low to medium complexity data sets.

ACM Transactions on Mathematical Software | 2010

Adaptive Projection Subspace Dimension for the Thick-Restart Lanczos Method

Ichitaro Yamazaki; Zhaojun Bai; Horst D. Simon; Lin-Wang Wang; Kesheng Wu

The Thick-Restart Lanczos (TRLan) method is an effective method for solving large-scale Hermitian eigenvalue problems. The performance of the method strongly depends on the dimension of the projection subspace used at each restart. In this article, we propose an objective function to quantify the effectiveness of the selection of subspace dimension, and then introduce an adaptive scheme to dynamically select the dimension to optimize the performance. We have developed an open-source software package a--TRLan to include this adaptive scheme in the TRLan method. When applied to calculate the electronic structure of quantum dots, a--TRLan runs up to 2.3x faster than a state-of-the-art preconditioned conjugate gradient eigensolver.

international parallel and distributed processing symposium | 2014

Improving the Performance of CA-GMRES on Multicores with Multiple GPUs

Ichitaro Yamazaki; Hartwig Anzt; Stanimire Tomov; Mark Hoemmen; Jack J. Dongarra

The Generalized Minimum Residual (GMRES) method is one of the most widely-used iterative methods for solving nonsymmetric linear systems of equations. In recent years, techniques to avoid communication in GMRES have gained attention because in comparison to floating-point operations, communication is becoming increasingly expensive on modern computers. Since graphics processing units (GPUs) are now becoming crucial component in computing, we investigate the effectiveness of these techniques on multicore CPUs with multiple GPUs. While we present the detailed performance studies of a matrix powers kernel on multiple GPUs, we particularly focus on orthogonalization strategies that have a great impact on both the numerical stability and performance of GMRES, especially as the matrix becomes sparser or ill-conditioned. We present the experimental results on two eight-core Intel Sandy Bridge CPUs with three NDIVIA Fermi GPUs and demonstrate that significant speedups can be obtained by avoiding communication, either on a GPU or between the GPUs. As part of our study, we investigate several optimization techniques for the GPU kernels that can also be used in other iterative solvers besides GMRES. Hence, our studies not only emphasize the importance of avoiding communication on GPUs, but they also provide insight about the effects of these optimization techniques on the performance of the sparse solvers, and may have greater impact beyond GMRES.

Concurrency and Computation: Practice and Experience | 2014

Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems

Ichitaro Yamazaki; Tingxing Dong; Raffaele Solcí; Stanimire Tomov; Jack J. Dongarra; Thomas C. Schulthess

For software to fully exploit the computing power of emerging heterogeneous computers, not only must the required computational kernels be optimized for the specific hardware architectures but also an effective scheduling scheme is needed to utilize the available heterogeneous computational units and to hide the communication between them. As a case study, we develop a static scheduling scheme for the tridiagonalization of a symmetric dense matrix on multicore CPUs with multiple graphics processing units (GPUs) on a single compute node. We then parallelize and optimize the Basic Linear Algebra Subroutines (BLAS)‐2 symmetric matrix‐vector multiplication, and the BLAS‐3 low rank symmetric matrix updates on the GPUs. We demonstrate the good scalability of these multi‐GPU BLAS kernels and the effectiveness of our scheduling scheme on twelve Intel Xeon processors and three NVIDIA GPUs. We then integrate our hybrid CPU‐GPU kernel into computational kernels at higher‐levels of software stacks, that is, a shared‐memory dense eigensolver and a distributed‐memory sparse eigensolver. Our experimental results show that our kernels greatly improve the performance of these higher‐level kernels, not only reducing the solution time but also enabling the solution of larger‐scale problems. Because such symmetric eigenvalue problems arise in many scientific and engineering simulations, our kernels could potentially lead to new scientific discoveries. Furthermore, these dense linear algebra algorithms present algorithmic characteristics that can be found in other algorithms. Hence, they are not only important computational kernels on their own but also useful testbeds to study the performance of the emerging computers and the effects of the various optimization techniques. Copyright

ieee international conference on high performance computing data and analytics | 2010

On techniques to improve robustness and scalability of a parallel hybrid linear solver

Ichitaro Yamazaki; Xiaoye S. Li

A hybrid linear solver based on the Schur complement method has great potential to be a general purpose solver scalable on tens of thousands of processors. For this, it is imperative to exploit two levels of parallelism; namely, solving independent subdomains in parallel and using multiple processors per subdomain. This hierarchical parallelism can lead to a scalable implementation which maintains numerical stability at the same time. In this framework, load imbalance and excessive communication, which can lead to performance bottlenecks, occur at two levels: in an intra-processor group assigned to the same subdomain and among inter-processor groups assigned to different subdomains. We developed several techniques to address these issues, such as taking advantage of the sparsity of right-hand-sides during the triangular solutions with interfaces, load balancing sparse matrix-matrix multiplication to form update matrices, and designing an effective asynchronous point-to-point communication of the update matrices. We present numerical results to demonstrate that with the help of these techniques, our hybrid solver can efficiently solve large-scale highly-indefinite linear systems on thousands of processors.

international conference on conceptual structures | 2012

One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators*

Ichitaro Yamazaki; Stanimire Tomov; Jack J. Dongarra

One-sided dense matrix factorizations are important computational kernels in many scientific and engineering simulations. In this paper, we propose two extensions of both right-looking (LU and QR) and left-looking (Cholesky) one-sided factorization algorithms to utilize the computing power of current heterogeneous architectures. We first describe a new class of non-GPU-resident algorithms that factorize only a submatrix of a coefficient matrix on a GPU at a time. We then extend the algorithms to use multiple GPUs attached to a multicore. These extensions not only enable the factorization of a matrix that does not fit in the aggregated memory of the multiple GPUs at once, but also provide potential of fully utilizing the computing power of the architectures. Since data movement is expensive on the current architectures, these algorithms are designed to minimize the data movement at multiple levels. To demonstrate the effectiveness of these algorithms, we present their performance on a single compute node of the Keeneland system, which consists of twelve Intel Xeon processors and three NVIDIA GPUs. The performance results show both negligible overheads and scalable performance of our non-GPU-resident and multi-GPU algorithms, respectively. These extensions are now parts of the MAGMA software package, a set of the state-of-the-art dense linear algebra routines for a multicore with GPUs.

ieee international conference on shape modeling and applications | 2006

Segmenting Point Sets

Ichitaro Yamazaki; Vijay Natarajan; Zhaojun Bai; Bernd Hamann

Extracting features from point sets is becoming increasingly important for purposes like model classification, matching, and exploration. We introduce a technique for segmenting a point-sampled surface into distinct features without explicit construction of a mesh or other surface representation. Our approach achieves computational efficiency through a three-phase segmentation process. The first phase of the process uses a topo-logical approach to define features and coarsens the input, resulting in a set of supernodes, each one representing a collection of input points. A graph cut is employed in the second phase to bisect the set of supernodes. Similarity between supernodes is computed as a weighted combination of geodesic distances and connectivity. Repeated application of the graph cut results in a hierarchical segmentation of the point input. In the last phase, a segmentation of the original point set is constructed by refining the segmentation of the supernodes based on their associated feature sizes. We apply our segmentation algorithm on laser-scanned models to evaluate its ability to capture geometric features in complex data sets

Numerical Computations with GPUs | 2014

Accelerating Numerical Dense Linear Algebra Calculations with GPUs

Jack J. Dongarra; Mark Gates; Azzam Haidar; Jakub Kurzak; Piotr Luszczek; Stanimire Tomov; Ichitaro Yamazaki

This chapter presents the current best design and implementation practices for the acceleration of dense linear algebra (DLA) on GPUs. Examples are given with fundamental algorithms—from the matrix–matrix multiplication kernel written in CUDA to the higher level algorithms for solving linear systems, eigenvalue and SVD problems. The implementations are available through the MAGMA library—a redesign for GPUs of the popular LAPACK. To generate the extreme level of parallelism needed for the efficient use of GPUs, algorithms of interest are redesigned and then split into well-chosen computational tasks. The tasks execution is scheduled over the computational components of a hybrid system of multicore CPUs with GPU accelerators using either static scheduling or a light-weight runtime system. The use of light-weight runtime systems keeps scheduling overhead low, similar to static scheduling, while enabling the expression of parallelism through sequential-like code. This simplifies the development effort and allows the exploration of the unique strengths of the various hardware components.

Archive | 2009

Numerical Methods for Quantum Monte Carlo Simulations of the Hubbard Model

Zhaojun Bai; Wenbin Chen; R. T. Scalettar; Ichitaro Yamazaki

One of the core problems in materials science is how the interactions between electrons in a solid give rise to properties like ∗This work was partially supported by the National Science Foundation under Grant 0313390, and Department of Energy, Office of Science, SciDAC grant DEFC02 06ER25793. Wenbin Chen was also supported in part by the China Basic Research Program under the grant 2005CB321701. 2 Bai, Chen, Scalettar, Yamazaki magnetism, superconductivity, and metal-insulator transitions? Our ability to solve this central question in quantum statistical mechanics numerically is presently limited to systems of a few hundred electrons. While simulations at this scale have taught us a considerable amount about certain classes of materials, they have very significant limitations, especially for recently discovered materials which have mesoscopic magnetic and charge order. In this paper, we begin with an introduction to the Hubbard model and quantum Monte Carlo simulations. The Hubbard model is a simple and effective model that has successfully captured many of the qualitative features of materials, such as transition metal monoxides, and high temperature superconductors. Because of its voluminous contents, we are not be able to cover all topics in detail; instead we focus on explaining basic ideas, concepts and methodology of quantum Monte Carlo simulation and leave various part for further study. Parts of this paper are our recent work on numerical linear algebra methods for quantum Monte Carlo simulations. 1 Hubbard model and QMC simulations The Hubbard model is a fundamental model to study one of the core problems in materials science: How do the interactions between electrons in a solid give rise to properties like magnetism, superconductivity, and metal-insulator transitions? In this lecture, we introduce the Hubbard model and outline quantum Monte Carlo (QMC) simulations to study many-electron systems. Subsequent lectures will describe computational kernels of the QMC simulations. 1.1 Hubbard model The two-dimensional Hubbard model [8, 9] we shall study is defined by the Hamiltonian: H = HK +Hμ +HV , (1.1) where HK , Hμ and HV stand for kinetic energy, chemical energy and potential energy, respectively, and are defined as HK = −t ∑ 〈i,j〉,σ (ciσcjσ + c † jσciσ), Hμ = −μ ∑

parallel computing | 2013

Performance comparison of parallel eigensolvers based on a contour integral method and a Lanczos method

Ichitaro Yamazaki; Hiroto Tadano; Tetsuya Sakurai; Tsutomu Ikegami

Abstract We study the performance of a parallel nonlinear eigensolver SSEig which is based on a contour integral method. We focus on symmetric generalized eigenvalue problems (GEPs) of computing interior eigenvalues. We chose to focus on GEPs because we can then compare the performance of SSEig with that of a publicly-available software package TRLan, which is based on a thick restart Lanczos method. To solve this type of problems, SSEig requires the solution of independent linear systems with different shifts, while TRLan solves a sequence of linear systems with a single shift. Therefore, while SSEig typically has a computational cost greater than that of TRLan, it also has greater parallel scalability. To compare the performance of these two solvers, in this paper, we develop performance models and present numerical results of solving large-scale eigenvalue problems arising from simulations of modeling accelerator cavities. In particular, we identify the crossover point, where SSEig becomes faster than TRLan. The parallel performance of SSEig solving nonlinear eigenvalue problems is also studied.

Explore More