Is this you? Create Your Porfile

Yusaku Yamamoto

University of Electro-Communications

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yusaku Yamamoto is active.

Explore More

Publication

Featured researches published by Yusaku Yamamoto.

Operations Research | 2005

A Double-Exponential Fast Gauss Transform Algorithm for Pricing Discrete Path-Dependent Options

Mark Broadie; Yusaku Yamamoto

This paper develops algorithms for the pricing of discretely sampled barrier, lookback, and hindsight options and discretely exercisable American options. Under the Black-Scholes framework, the pricing of these options can be reduced to evaluation of a series of convolutions of the Gaussian distribution and a known function. We compute these convolutions efficiently using the double-exponential integration formula and the fast Gauss transform. The resulting algorithms have computational complexity of O(nN), where the number of monitoring/exercise dates is n and the number of sample points at each date is N, and our results show the error decreases exponentially with N. We also extend the approach and provide results for Mertons lognormal jump-diffusion model.

Journal of Computational and Applied Mathematics | 2011

A block IDR(s) method for nonsymmetric linear systems with multiple right-hand sides

Lei Du; Tomohiro Sogabe; Bo Yu; Yusaku Yamamoto; Shao-Liang Zhang

The IDR(s) based on the induced dimension reduction (IDR) theorem, is a new class of efficient algorithms for large nonsymmetric linear systems. IDR(1) is mathematically equivalent to BiCGStab at the even IDR(1) residuals, and IDR(s) with s>1 is competitive with most Bi-CG based methods. For these reasons, we extend the IDR(s) to solve large nonsymmetric linear systems with multiple right-hand sides. In this paper, a variant of the IDR theorem is given at first, then the block IDR(s), an extension of IDR(s) based on the variant IDR(s) theorem, is proposed. By analysis, the upper bound on the number of matrix-vector products of block IDR(s) is the same as that of the IDR(s) for a single right-hand side in generic case, i.e., the total number of matrix-vector products of IDR(s) may be m times that of of block IDR(s), where m is the number of right-hand sides. Numerical experiments are presented to show the effectiveness of our proposed method.

parallel computing technologies | 2007

Accelerating the singular value decomposition of rectangular matrices with the CSK600 and the integrable SVD

Yusaku Yamamoto; Takeshi Fukaya; Takashi Uneyama; Masami Takata; Kinji Kimura; Masashi Iwasaki; Yoshimasa Nakamura

We propose an approach to speed up the singular value decomposition (SVD) of very large rectangular matrices using the CSX600 floating point coprocessor. The CSX600-based acceleration board we use offers 50GFLOPS of sustained performance, which is many times greater than that provided by standard microprocessors. However, this performance can be achieved only when a vendor-supplied matrix-matrix multiplication routine is used and the matrix size is sufficiently large. In this paper, we optimize two of the major components of rectangular SVD, namely, QR decomposition of the input matrix and back-transformation of the left singular vectors by matrix Q, so that large-size matrix multiplications can be used efficiently. In addition, we use the Integrable SVD algorithm to compute the SVD of an intermediate bidiagonal matrix. This helps to further speed up the computation and reduce the memory requirements. As a result, we achieved up to 3.5 times speedup over the Intel Math Kernel Library running on an 3.2GHz Xeon processor when computing the SVD of a 100,000 × 4000 matrix.

ieee international conference on high performance computing data and analytics | 2005

Performance modeling and optimal block size selection for a BLAS-3 based tridiagonalization algorithm

Yusaku Yamamoto

We construct a performance model for Bischof & Wus tridiagonalization algorithm that is fully based on the level-3 BLAS. The model has a hierarchical structure, which reflects the hierarchical structure of the original algorithm, and given the matrix size, the two block sizes and the performance data of the underlying BLAS routines, predicts the execution time of the algorithm. Experiments on the Opteron and Alpha 21264A processors show that the model is quite accurate and can predict the performance of the algorithm for matrix sizes from 1920 to 7680 and for various block sizes with relative errors below 10%. The model will serve as a key component of an automatic tuned library that selects the optimal block sizes itself. It can also be used in a grid environment to help the user find which of the available machines to use to solve his/her problem in the shortest time

international conference on cluster computing | 2008

A large-grained parallel algorithm for nonlinear eigenvalue problems and its implementation using OmniRPC

Takeshi Amako; Yusaku Yamamoto; Shao-Liang Zhang

The nonlinear eigenvalue problem plays an important role in various fields such as nonlinear elasticity, electronic structure calculation and theoretical fluid dynamics. We recently proposed a new algorithm for the nonlinear eigenvalue problem, which reduces the original problem to a smaller generalized linear eigenvalue problem with Hankel coefficient matrices through complex contour integral. This algorithm has a unique feature that it can find all the eigenvalues in a closed curve on the complex plane. Moreover, it has large-grain parallelism and is suited for execution in a grid environment. In this paper, we study the numerical properties of our algorithm theoretically. In particular, we analyze the effect of numerical integration to the computed eigenvalues and give a guideline on how to choose the size of the Hankel matrices properly. Also, we show the parallel performance of our algorithm implemented on a PC cluster using OmniRPC, a grid RPC system. Parallel efficiency of 75% is achieved when solving a nonlinear eigenvalue problem of order 1000 using 14 processors.

international conference on cluster computing | 2008

A dynamic programming approach to optimizing the blocking strategy for the Householder QR decomposition

Takeshi Fukaya; Yusaku Yamamoto; Shao-Liang Zhang

In this paper, we present a new approach to optimizing the blocking strategy for the householder QR decomposition. In high performance implementations of the householder QR algorithm, it is common to use a blocking technique for the efficient use of the cache memory. There are several well known blocking strategies like the fixed-size blocking and recursive blocking, and usually their parameters such as the block size and the recursion level are tuned according to the target machine and the problem size. However, strategies generated with this kind of parameter optimization constitute only a small fraction of all possible blocking strategies. Given the complex performance characteristics of modern microprocessors, non-standard strategies may prove effective on some machines. Considering this situation, we first propose a new universal model that can express a far larger class of blocking strategies than has been considered so far. Next, we give an algorithm to find a near-optimal strategy from this class using dynamic programming. As a result of this approach, we found an effective blocking strategy that has never been reported. Performance evaluation on the Opteron and Core2 processors show that our strategy achieves about 1.2 times speedup over recursive blocking when computing the QR decomposition of a 6000 times 6000 matrix.

international parallel and distributed processing symposium | 2006

Efficient parallel implementation of a weather derivatives pricing algorithm based on the fast Gauss transform

Yusaku Yamamoto

CDD weather derivatives are widely used to hedge weather risks and their fast and accurate pricing is an important problem in financial engineering. In this paper, we propose an efficient parallelization strategy of a pricing algorithm for the CDD derivatives. The algorithm uses the fast Gauss transform to compute the expected payoff of the derivative and has proved faster and more accurate than the conventional Monte Carlo method. However, speeding up the algorithm on a distributed-memory parallel computer is not straightforward because naive parallelization will require a large amount of inter-processor communication. Our new parallelization strategy exploits the structure of the fast Gauss transform and thereby reduces the amount of inter-processor communication considerably. Numerical experiments show that our strategy achieves up to 50% performance improvement over the naive one on a 16-node Mac G5 cluster and can compute the price of a representative CDD derivative in 7 seconds. This speed is adequate for almost any applications.

Concurrency and Computation: Practice and Experience | 2017

Performance analysis and optimization of the parallel one-sided block Jacobi SVD algorithm with dynamic ordering and variable blocking

Shuhei Kudo; Yusaku Yamamoto; Martin Bečka; Marián Vajteršic

The one‐sided block Jacobi (OSBJ) method is known to be an efficient method for computing the singular value decomposition on a parallel computer. In this paper, we focus on the most recent variant of the OSBJ method, the one with parallel dynamic ordering and variable blocking, and present both theoretical and experimental analyses of the algorithm. In the first part of the paper, we provide a detailed theoretical analysis of its convergence properties. In the second part, based on preliminary performance measurement on the Fujitsu FX10 and SGI Altix ICE parallel computers, we identify two performance bottlenecks of the algorithm and propose new implementations to resolve the problem. Experimental results show that they are effective and can achieve up to 1.8 and 1.4 times speedup of the total execution time on the FX10 and the Altix ICE, respectively. Comparison with the ScaLAPACK SVD routine PDGESVD shows that our OSBJ solver is efficient when solving small to medium sized problems (n < 10000) using modest number ( < 100) of computing nodes. Copyright

Numerical Algorithms | 2015

A new subtraction-free formula for lower bounds of the minimal singular value of an upper bidiagonal matrix

Takumi Yamashita; Kinji Kimura; Yusaku Yamamoto

Traces of inverse powers of a positive definite symmetric tridiagonal matrix give lower bounds of the minimal singular value of an upper bidiagonal matrix. In a preceding work, a formula for the traces which gives the diagonal entries of the inverse powers is presented. In this paper, we present another formula which gives the traces based on a quite different idea from the one in the preceding work. An efficient implementation of the formula for practice is also presented.

Numerical Algorithms | 2012

Error analysis for matrix eigenvalue algorithm based on the discrete hungry Toda equation

Akiko Fukuda; Yusaku Yamamoto; Masashi Iwasaki; Emiko Ishiwata; Yoshimasa Nakamura

Based on the integrable discrete hungry Toda (dhToda) equation, the authors designed an algorithm for computing eigenvalues of a class of totally nonnegative matrices (Ann Mat Pura Appl, doi:10.1007/s10231-011-0231-0). This is named the dhToda algorithm, and can be regarded as an extension of the well-known qd algorithm. The shifted dhToda algorithm has been also designed by introducing the origin shift in order to accelerate the convergence. In this paper, we first propose the differential form of the shifted dhToda algorithm, by referring to that of the qds (dqds) algorithm. The number of subtractions is then reduced and the effect of cancellation in floating point arithmetic is minimized. Next, from the viewpoint of mixed error analysis, we investigate numerical stability of the proposed algorithm in floating point arithmetic. Based on this result, we give a relative perturbation bound for eigenvalues computed by the new algorithm. Thus it is verified that the eigenvalues computed by the proposed algorithm have high relative accuracy. Numerical examples agree with our error analysis for the algorithm.

Explore More