Keita Teranishi
Sandia National Laboratories
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Keita Teranishi.
Proceedings of the 21st European MPI Users' Group Meeting on | 2014
Keita Teranishi; Michael A. Heroux
The current system reaction to the loss of a single MPI process is to kill all the remaining processes and restart the application from the most recent checkpoint. This approach will become unfeasible for future extreme scale systems. We address this issue using an emerging resilient computing model called Local Failure Local Recovery (LFLR) that provides application developers with the ability to recover locally and continue application execution when a process is lost. We discuss the design of our software framework to enable the LFLR model using MPI-ULFM and demonstrate the resilient version of MiniFE that achieves a scalable recovery from process failures.
international conference on computational science | 2002
Sanjukta Bhowmick; Padma Raghavan; Keita Teranishi
Many fundamental problems in scientific computing have more than one solution method. It is not uncommon for alternative solution methods to represent different tradeoffs between solution cost and reliability. Furthermore, the performance of a solution method often depends on the numerical properties of the problem instance and thus can vary dramatically across application domains. In such situations, it is natural to consider the construction of a multi-method composite solver to potentially improve both the average performance and reliability. In this paper, we provide a combinatorial framework for developing such composite solvers. We provide analytical results for obtaining an optimal composite from a set of methods with normalized measures of performance and reliability. Our empirical results demonstrate the effectiveness of such optimal composites for solving large, sparse linear systems of equations.
Numerical Linear Algebra With Applications | 2003
Padma Raghavan; Keita Teranishi; Esmond G. Ng
SUMMARY 11 Consider the solution of large sparse symmetric positive denite linear systems using the preconditioned conjugate gradient method. On sequential architectures, incomplete Cholesky factorizations provide ef- 13 fective preconditioning for systems from a variety of application domains, some of which may have widely diering preconditioning requirements. However, incomplete factorization based preconditioners 15 are not considered suitable for multiprocessors. This is primarily because the triangular solution step required to apply the preconditioner (at each iteration) does not scale well due to the large latency of 17 inter-processor communication. We propose a new approach to overcome this performance bottleneck by coupling incomplete factorization with a selective inversion scheme to replace triangular solutions by 19 scalable matrix-vector multiplications. We discuss our algorithm, analyze its communication latency for model sparse linear systems, and provide empirical results on its performance and scalability. Copyright 21 ? 2003 John Wiley & Sons, Ltd.
international conference on conceptual structures | 2015
Andrew A. Chien; Pavan Balaji; P. Beckman; Nan Dun; Aiman Fang; Hajime Fujita; Kamil Iskra; Zachary A. Rubenstein; Z. Zheng; R. Schreiber; J. Hammond; J. Dinan; Ignacio Laguna; D. Richards; A. Dubey; B. van Straalen; Mark Hoemmen; Michael A. Heroux; Keita Teranishi; Andrew R. Siegel
Abstract Exascale studies project reliability challenges for future high-performance computing (HPC) systems. We propose the Global View Resilience (GVR) system, a library that enables applications to add resilience in a portable, application-controlled fashion using versioned distributed arrays. We describe GVRs interfaces to distributed arrays, versioning, and cross-layer error recovery. Using several large applications (OpenMC, the preconditioned conjugate gradient solver PCG, ddcMD, and Chombo), we evaluate the programmer effort to add resilience. The required changes are small ( 2% LOC), localized, and machine-independent, requiring no software architecture changes. We also measure the overhead of adding GVR versioning and show that generally overheads 2% are achieved. We conclude that GVRs interfaces and implementation are flexible and portable and create a gentle-slope path to tolerate growing error rates in future systems.
high performance computing systems and applications | 2014
Saurabh Hukerikar; Pedro C. Diniz; Robert F. Lucas; Keita Teranishi
As the scale and complexity of future High Performance Computing systems continues to grow, the rising frequency of faults and errors and their impact on HPC applications will make it increasingly difficult to accomplish useful computation. Traditional means of fault detection and correction are either hardware based or use software based redundancy. Redundancy based approaches usually entail complete replication of the program state or the computation and therefore incurs substantial overhead to application performance. Therefore, the wide-scale use of full redundancy in future exascale class systems is not a viable solution for error detection and correction. In this paper we present an application level fault detection approach that is based on adaptive redundant multithreading. Through a language level directive, the programmer can define structured code blocks. When these blocks are executed by multiple threads and their outputs compared, we can detect errors in specific parts of the program state that will ultimately determine the correctness of the application outcome. The compiler outlines such code blocks and a runtime system reasons whether their execution by redundant threads should enabled/disabled by continuously observing and learning about the fault tolerance state of the system. By providing flexible building blocks for application specific fault detection, our approach makes possible more reasonable performance overheads than full redundancy. Our results show that the overheads to application performance are in the range of 4% to 70% due to runtime system being continuously aware of the rate and source of system faults, rather than the usual overhead in the excess of 100% that is incurred by complete replication.
ieee international conference on high performance computing data and analytics | 2015
Thomas Herault; Aurelien Bouteiller; George Bosilca; Marc Gamell; Keita Teranishi; Manish Parashar; Jack J. Dongarra
The ability to consistently handle faults in a distributed environment requires, among a small set of basic routines, an agreement algorithm allowing surviving entities to reach a consensual decision between a bounded set of volatile resources. This paper presents an algorithm that implements an Early Returning Agreement (ERA) in pseudo-synchronous systems, which optimistically allows a process to resume its activity while guaranteeing strong progress. We prove the correctness of our ERA algorithm, and expose its logarithmic behavior, which is an extremely desirable property for any algorithm which targets future exascale platforms. We detail a practical implementation of this consensus algorithm in the context of an MPI library, and evaluate both its efficiency and scalability through a set of benchmarks and two fault tolerant scientific applications.
ieee international conference on high performance computing data and analytics | 2015
Marc Gamell; Keita Teranishi; Michael A. Heroux; Jackson R. Mayo; Hemanth Kolla; Jacqueline H. Chen; Manish Parashar
Application resilience is a key challenge that has to be addressed to realize the exascale vision. Online recovery, even when it involves all processes, can dramatically reduce the overhead of failures as compared to the more traditional approach where the job is terminated and restarted from the last checkpoint. In this paper we explore how local recovery can be used for certain classes of applications to further reduce overheads due to resilience. Specifically we develop programming support and scalable runtime mechanisms to enable online and transparent local recovery for stencil-based parallel applications on current leadership class systems. We also show how multiple independent failures can be masked to effectively reduce the impact on the total time to solution. We integrate these mechanisms with the S3D combustion simulation, and experimentally demonstrate (using the Titan Cray-XK7 system at ORNL) the ability to tolerate high failure rates (i.e., node failures every 5 seconds) with low overhead while sustaining performance, at scales up to 262144 cores.
SIAM Journal on Scientific Computing | 2010
Padma Raghavan; Keita Teranishi
We consider parallel preconditioning to solve large sparse linear systems
high performance distributed computing | 2015
Marc Gamell; Keita Teranishi; Michael A. Heroux; Jackson R. Mayo; Hemanth Kolla; Jacqueline H. Chen; Manish Parashar
Ax=b
ieee high performance extreme computing conference | 2014
Saurabh Hukerikar; Keita Teranishi; Pedro C. Diniz; Robert F. Lucas
using conjugate gradients when