Kenichi Hagihara | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kenichi Hagihara is active.

Explore More

Publication

Featured researches published by Kenichi Hagihara.

symposium on applications and the internet | 2004

A comparison among grid scheduling algorithms for independent coarse-grained tasks

Noriyuki Fujimoto; Kenichi Hagihara

The most common objective function of task scheduling problems is makespan. However, on a computational grid, the 2nd optimal makespan may be much longer than the optimal makespan because the computing power of a grid varies over time. So, if the performance measure is makespan, there is no approximation algorithm in general for scheduling onto a grid. In contrast, recently the authors proposed the computing power consumed by a schedule as a criterion of the schedule and, for the criterion, gave (1+m(log/sub e/(m - 1) + 1)/n)-approximation algorithm RR for scheduling n independent coarse-grained tasks with the same length onto a grid with m processors. RR does not use any prediction information on the underlying resources. RR is the first approximation algorithm for grid scheduling. However, so far any performance comparison among related heuristic algorithms is not given. This paper shows experimental results on the comparison of the consumed computing power of a schedule among RR and five related algorithms. It turns out that RR is next to the best of algorithms that need the prediction information on processor speeds and task lengths though RR does not require such information.

international conference on parallel processing | 2003

Near-optimal dynamic task scheduling of independent coarse-grained tasks onto a computational grid

Noriyuki Fujimoto; Kenichi Hagihara

The most common objective function of task scheduling problems is makespan. However, on a computational grid, the 2nd optimal makespan may be much longer than the optimal makespan because the computing power of a grid varies over time. So, if the performance measure is makespan, there is no approximation algorithm in general for scheduling onto a grid. A novel criterion of a schedule is proposed. The proposed criterion is called total processor cycle consumption, which is the total number of instructions the grid could compute until the completion time of the schedule. Moreover, for the criterion, this gives a (l+ m(loge(m-1)+1)/n)-approximation algorithm for scheduling n independent coarse-grained tasks with the same length onto a grid with m processors. The proposed algorithm does not use any prediction information on the performance of underlying resources. This result implies a nontrivial result that the computing power consumed by a parameter sweep application can be limited in such a case within (1+m(loge(m-1)+1)/n) times that required by an optimal schedule, regardless how the speed of each processor varies over time

parallel computing | 2010

High-performance cone beam reconstruction using CUDA compatible GPUs

Yusuke Okitsu; Fumihiko Ino; Kenichi Hagihara

Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper presents an acceleration method for cone beam reconstruction using CUDA compatible GPUs. The proposed method accelerates the Feldkamp, Davis, and Kress (FDK) algorithm using three techniques: (1) off-chip memory access reduction for saving the memory bandwidth; (2) loop unrolling for hiding the memory latency; and (3) multithreading for exploiting multiple GPUs. We describe how these techniques can be incorporated into the reconstruction code. We also show an analytical model to understand the reconstruction performance on multi-GPU environments. Experimental results show that the proposed method runs at 83% of the theoretical memory bandwidth, achieving a throughput of 64.3 projections per second (pps) for reconstruction of 512^3-voxel volume from 360 512^2-pixel projections. This performance is 41% higher than the previous CUDA-based method and is 24 times faster than a CPU-based method optimized by vector intrinsics. Some detailed analyses are also presented to understand how effectively the acceleration techniques increase the reconstruction performance of a naive method. We also demonstrate out-of-core reconstruction for large-scale datasets, up to 1024^3-voxel volume.

Journal of Physiological Sciences | 2008

Specifications of insilicoML 1.0: a multilevel biophysical model description language.

Yoshiyuki Asai; Yasuyuki Suzuki; Yoshiyuki Kido; Hideki Oka; Eric Martin Heien; Masao Nakanishi; Takahito Urai; Kenichi Hagihara; Yoshihisa Kurachi; Taishin Nomura

An extensible markup language format, insilicoML (ISML), version 0.1, describing multi-level biophysical models has been developed and available in the public domain. ISML is fully compatible with CellML 1.0, a model description standard developed by the IUPS Physiome Project, for enhancing knowledge integration and model sharing. This article illustrates the new specifications of ISML 1.0 that largely extend the capability of ISML 0.1. ISML 1.0 can describe various types of mathematical models, including ordinary/partial differential/difference equations representing the dynamics of physiological functions and the geometry of living organisms underlying the functions. ISML 1.0 describes a model using a set of functional elements (modules) each of which can specify mathematical expressions of the functions. Structural and logical relationships between any two modules are specified by edges, which allow modular, hierarchical, and/or network representations of the model. The role of edge-relationships is enriched by key words in order for use in constructing a physiological ontology. The ontology is further improved by the traceability of history of the models development and by linking between different ISML models stored in the models database using meta-information. ISML 1.0 is designed to operate with a model database and integrated environments for model development and simulations for knowledge integration and discovery.

international symposium on parallel and distributed computing | 2003

Near-optimal dynamic task scheduling of precedence constrained coarse-grained tasks onto a computational grid

Noriyuki Fujimoto; Kenichi Hagihara

The most common objective function of task scheduling problems is makespan. However, on a computational grid, the 2nd optimal makespan may be much longer than the optimal makespan because the speed of each processor of a grid varies over time. So, if the performance measure is makespan, there is no approximation algorithm in general for scheduling onto a grid. In contrast, recently the authors proposed the computing power consumed by a schedule as a criterion of the schedule. For the criterion, this paper gives a (1 + Lcp(n)ċm(loge(m-1)+1)/n)-approximation algorithm for scheduling precedence constrained coarse-grained tasks with the same length onto a grid where n is the number of tasks, m is the number of processors, and Lcp(n) is the length of the critical path of the task graph. The proposed algorithm does not use any prediction information on the performance of underlying resources. Lcp(n) is usually a sublinear function of n. So, the above performance guarantee converges to one as n grows. This result implies a non-trivial result that the computing power consumed by an application on a grid can be limited within (1 + Lcp(n)ċm(loge(m-1)+1)/n) times that required by an optimal schedule in such a case.

international workshop on distributed algorithms | 1989

Optimal Fault-Tolerant Distributed Algorithms for Election in Complete Networks with a Global Sense of Direction

Toshimitsu Masuzawa; Naoki Nishikawa; Kenichi Hagihara; Nobuki Tokura

This paper considers the leader election problem (LEP) in asynchronous complete networks with undetectable fail-stop failures. Especially, it is discussed whether presence of a global sense of direction affects the message complexity of LEP in faulty networks. For a complete network of n processors where k processors start the algorithm spontaneously and at most f p (<n/2) processors are faulty, this paper shows

Computers & Graphics | 2008

Technical Section: A decompression pipeline for accelerating out-of-core volume rendering of time-varying data

Daisuke Nagayasu; Fumihiko Ino; Kenichi Hagihara

This paper presents a decompression pipeline capable of accelerating out-of-core volume rendering of time-varying scalar data. Our pipeline is based on a two-stage compression method that cooperatively uses the CPU and the graphics processing unit (GPU) to transfer compressed data entirely from the storage device to the video memory. This method combines two different compression algorithms, namely packed volume texture compression (PVTC) and Lempel-Ziv-Oberhumer (LZO) compression, allowing us to exploit both temporal and spatial coherence in time-varying data. Furthermore, it achieves fast decompression by taking architectural advantages of each processing unit: a hardware component on the GPU and a large cache on the CPU, each suited to decompress PVTC and LZO encoded data, respectively. We also integrate the method with a thread-based pipeline mechanism to increase the data throughput by overlapping data loading, data decompression, and rendering stages. Our pipelined renderer runs on a quad-core PC and achieves a video rate of 41 frames per second (fps) in average for 258x258x208 voxel data with 150 time steps. It also demonstrates an almost interactive rate of 8fps for 512x512x295 voxel data with 411 time steps.

Systems and Computers in Japan | 1991

Efficient distributed algorithms solving problems about the connectivity of network

Jungho Park; Nobuki Tokura; Toshimitsu Masuzawa; Kenichi Hagihara

This paper presents efficient distributed algorithms on an asynchronous network for the following problems: finding bi-connected components, finding cutpoints, finding bridges, testing for bi-connectedness and finding strongly connected components of a directed graph defined on a network. All these distributed algorithms use a depth-first search tree having an arbitrary processor in the network as its root. The communication complexity of these algorithms is O(nlogn+e) and their ideal-time complexity is O(nG(n)), where n and e represent the numbers of processors and links, respectively, and G(n) is almost a constant. It is shown also that a lower bound for the communication complexity of the five problems is O(e) and a lower bound for their ideal-time complexity is O(n).

ieee international conference on high performance computing, data, and analytics | 2008

Accelerating cone beam reconstruction using the CUDA-enabled GPU

Yusuke Okitsu; Fumihiko Ino; Kenichi Hagihara

Compute unified device architecture (CUDA) is a software developmentplatform that enables us to write and run general-purpose applications onthe graphics processing unit (GPU). This paper presents a fast method for conebeam reconstruction using the CUDA-enabled GPU. The proposed method is acceleratedby two techniques: (1) off-chip memory access reduction; and (2) memorylatency hiding. We describe how these techniques can be incorporated intoCUDA code. Experimental results show that the proposed method runs at 82%of the peak memory bandwidth, taking 5.6 seconds to reconstruct a 5123-voxelvolume from 360 5122-pixel projections. This performance is 18% faster thanthe prior method. Some detailed analyses are also presented to understand howeffectively the acceleration techniques increase the reconstruction performanceof a naive method.

Discrete Applied Mathematics | 1987

An optimal time algorithm for the k -vertex-connectivity unweighted augmentation problem for rooted directed trees

Toshimitsu Masuzawa; Kenichi Hagihara; Nobuki Tokura

Abstract For a given digraph G=(V,A) and a positive integer k, the k-vertex-connectivity unweighted augmentation problem (k-VCUAP) for G is to find a minimum set of arcs A′ (A′⊆(V×V−A)) such that the digraph (V,A∪A′) is k-vertex-connected. It is known that the time-complexity of 1-VCUAP for every digraph is θ(|V|+|A|). However, it remains still open whether or not there exist polynomial time algorithms for k-VCUAPs (k≥2) for digraphs. This paper shows that the time-complexity of k-VCUAP (k≥2) is θ(k|V|) for every rooted directed tree.

Explore More