Ken Naono | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ken Naono is active.

Explore More

Publication

Featured researches published by Ken Naono.

virtualization technologies in distributed computing | 2009

Investigating suitability for server virtualization using business application benchmarks

Tsuyoshi Tanaka; Toshiaki Tarui; Ken Naono

Server virtualization is now required for data center systems to reduce the number of servers. However, it is still unclear which business applications are suitable for virtualization. We present our evaluation results for four types of business application benchmarks on our virtualization system. The results show that the virtualization performance of a TPC-H workload, which mainly executes referencing on a database, uses about 90% of the non-virtualized performance, and that the virtualization performance of the TPC-H workload is better than that of the other benchmark applications. The results of a new performance characteristic for virtualization indicated that application programs, which have performance bottlenecks in disk I/Os and low CPU utilizations in a non-virtualized environment, are suitable for virtualization.

international conference on computer and information sciences | 2014

Performance comparison of least slack time based heuristics for job scheduling on computational grid

Ahmad Abba Haruna; Nordin Zakaria; Low T. Jung; Anindya Jyoti Pal; Ken Naono; Jun Okitsu

In recent years, increasing demand for computing has led to the development of computational grid. Typically scheduling challenges tend to be NP-hard problems where there is no optimal solution. The research reported here therefore is focused on the development of hybrids scheduling algorithms based on deadline and slack time parameters and its variations, using the concept of optimization techniques. An extensive performance comparison has been presented using real workload traces as benchmark on a grid computational environment. The results were compared with some baseline scheduling approaches in extant literature. The results have shown that the performances of grid scheduling algorithms developed and reported in this paper give good results in most of the cases and also support true scalability, when in the scenario of increasing workload and number of processors on a computational grid environment.

ieee international conference on high performance computing data and analytics | 2000

High performance implementation of tridiagonalization on the SR8000

Ken Naono; Yiisaku Yamamoto; Mitsuyoshi Igai; Hiroyuki Hirayama

The methods of high performance tridiagonalization on the HITACHI SR8000 are described and evaluated. To achieve high performance, we adopted the blocked tridiagonalization and the scattered square decomposition. In addition, to achieve more performance in one node, we took the ways of the rectangular computation in the diagonal blocks and the loop integration for reducing the number of read/write operations. On one node of the SR8000, we achieved about 4.0 Gflop/s in the 4000-dimension tridiagonalization of a real symmetric matrix. This is much better than the 2.9 Gflop/s of our matrix librarys on the HITACHI S-3800, which has the same peak performance with one node of the SR8000.

international conference on conceptual structures | 2013

A Sparse Matrix Library with Automatic Selection of Iterative Solvers and Preconditioners

Takao Sakurai; Takahiro Katagiri; Hisayasu Kuroda; Ken Naono; Mitsuyoshi Igai; Satoshi Ohshima

Abstract Many iterative solvers and preconditioners have recently been proposed for linear iterative matrix libraries. Currently, library users have to manually select the solvers and preconditioners to solve their target matrix. However, if they select the wrong combination of the two, they have to spend a lot of time on calculations or they cannot obtain the solution. Therefore, an approach for the automatic selection of solvers and preconditioners is needed. We have developed a function that automatically selects an effective solver/preconditioner combination by referencing the history of relative residuals at run- time to predict whether the solver will converge or stagnate. Numerical evaluation with 50 Florida matrices showed that the proposed function can select effective combinations in all matrices. This suggests that our function can play a significant role in sparse iterative matrix computations.

ieee international conference on high performance computing data and analytics | 2012

Control Formats for Unsymmetric and Symmetric Sparse Matrix–Vector Multiplications on OpenMP Implementations

Takahiro Katagiri; Takao Sakurai; Mitsuyoshi Igai; Satoshi Ohshima; Hisayasu Kuroda; Ken Naono; Kengo Nakajima

In this paper, we propose “control formats” to obtain better thread performance of sparse matrix–vector multiplication (SpMV) for unsymmetric and symmetric matrices. By using the control formats, we established the following maximum speedups of SpMV in 16-thread execution on one node of the T2K Open Supercomputer: (1) 7.14( for an unsymmetric matrix by using the proposed Branchless Segmented Scan compared to the original Segmented Scan method; (2) 12.7( for a symmetric matrix by using the proposed Zero-element Computation-free method compared to a simple SpMV implementation.

international conference on computational science | 2014

Improvement in Performance of GMRES(m) Method by Applying a Genetic Algorithm to the Restart Process

Nobutoshi Sagawa; Norihisa Komoda; Ken Naono

This paper presents an approach for improving the efficiency of solving linear systems by applying a genetic algorithm (GA) to the GMRES(m) method, which is one of the most powerful solvers of large-scale asymmetric sparse matrices. For each restart process in GMRES(m), the initial vectors are regarded as chromosomes. When the restart process stagnates, the GA process performs a crossover on chromosomes to create new chromosomes for the next restart stage. An algorithm, which was inspired by the look-back type of the GMRES(m) method, was developed to effectively perform the crossover process. The proposed method was tested on five example matrices selected from the University of Florida sparse matrix collection. The results show that improvements in execution time ranged from 15% to 600%, depending on the nature of the matrix.

international conference on intelligent and advanced systems | 2012

A fully run-time auto-tuned sparse iterative solver with OpenATLib

Ken Naono; Takao Sakurai; Mitsuyoshi Igai; Takahiro Katagiri; Satoshi Ohshima; Shoji Itoh; Kengo Nakajima; Hisayasu Kuroda

We propose a general application programming interface called OpenATLib for auto-tuning (AT). OpenATLib is carefully designed to establish the reusability of AT functions for sparse iterative solvers. Using APIs of OpenATLib, we develop a fully auto-tuned sparse iterative solver called Xabclib. Xabclib has several novel runtime AT functions. We also develop a numerical computation policy that can optimize memory space and computational accuracy. Using the above functions and policies, we obtain the following important findings: (1) an average memory space is reduced to 1/45 under lower memory policies, and (2) fault convergence, which the conventional solvers judges to be converged but actually not converged in the sense of the before-preconditioned matrix, is avoided under higher accuracy policies. The results imply policy-based runtime AT plays significant role in sparse iterative matrix computations.

2012 IEEE Conference on Control, Systems & Industrial Informatics | 2012

Towards greening a campus grid: Free cooling during unsociable hours

Jun Okitsu; S. A. Sulaiman; Ken Naono; Nordin Zakaria; A. Oxley

Free cooling has become widely used in the area of computer room thermal control, especially where the temperature constraint can be relaxed. However, in tropical regions, such as Malaysia, free cooling cannot be applied all of the time. This paper presents a strategy for grids that allows jobs to be executed on resources that are free cooled. The paper describes how a recommended period of time for using free cooled resources is predicted. The period is predicted from a historical analysis by using machine learning. Experiments on a classroom used for campus grid computing showed that, typically, free cooled resources can be used for 5 hours per day, when the temperature is less than 28 degrees Celsius. The result is of use to those developing campus grids in tropical countries.

asia-pacific network operations and management symposium | 2011

Design and evaluation of integrated monitoring software for SaaS-based systems

Masahiro Yoshizawa; Ken Naono

A design of integrated monitoring software for understanding the detailed status of applications and hardware devices in a short period of time is proposed. Data models for real-time data and system configurations for easier associations between the real-time data are also proposed. To verify the effec-tivity of the proposed integrated monitoring software, surveys on the operations managers using monitoring software in a real data center were conducted. The results of the survey show that the integrated monitoring software can reduce the “transition time” (namely, the time an operations manager takes to switch between from monitoring one real-time data to another) by about 82%. This result suggests that the integrated monitoring software is effective for reducing the cost of monitoring SaaS-based systems.

international symposium on parallel and distributed processing and applications | 2003

A vector-parallel FFT with a user-specifiable data distribution scheme

Yusaku Yamamoto; Mitsuyoshi Igai; Ken Naono

We propose a 1-dimensional FFT routine for distributed-memory vector-parallel machines which provides the user with both high performance and flexibility in data distribution. Our routine inputs/outputs data using block cyclic data distribution, and the block sizes for input and output can be specified independently by the user. This flexibility is realized with the same amount of inter-processor communication as the widely used transpose algorithm and no additional overhead for data redistribution is necessary. We implemented our method on the Hitachi SR2201, a distributed-memory parallel machine with pseudovector processing nodes, and obtained 45% of the peak performance on 16 nodes when the problem size is N = 224. This performance was unchanged for a wide range of block sizes from 1 to 16.

Explore More