Kenjiro Taura | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kenjiro Taura is active.

Explore More

Publication

Featured researches published by Kenjiro Taura.

conference on high performance computing (supercomputing) | 2000

The MicroGrid: a scientific tool for modeling computational gridsr

Hyo Jung Song; Xianan Liu; Dennis Jakobsen; Ranjita Bhagwan; Xingbin Zhang; Kenjiro Taura; Andrew A. Chien

The complexity and dynamic nature of the Internet (and the emerging Computational Grid) demand that middleware and applications adapt to the changes in configuration and availability of resources. However, to the best of our knowledge there are no simulation tools which support systematic exploration of dynamic Grid software (or Grid resource) behavior. We describe our vision and initial efforts to build tools to meet these needs. Our MicroGrid simulation tools enable Globus applications to be run in arbitrary virtual grid resource environments, enabling broad experimentation. We describe the design of these tools, and their validation on micro- benchmarks, the NA parallel benchmarks, and an entire Grid application. These validation experiments show that the MicroGrid can match actual experiments within a few percent (2% to 4%).

Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556) | 2000

A heuristic algorithm for mapping communicating tasks on heterogeneous resources

Kenjiro Taura; Andrew A. Chien

A heuristic algorithm that maps data processing tasks onto heterogeneous resources (i.e. processors and links of various capacities) is presented. The algorithm tries to achieve a good throughput of the whole data processing pipeline, taking both parallelism (load balance) and communication volume (locality) into account. It performs well both under computationally intensive and communication-intensive conditions. When all tasks/processors are of the same size and communication is negligible, it quickly distributes the computation load over the processors and finds the optimal mapping. As communication becomes significant and reveals a bottleneck, it trades parallelism for reduction of communication traffic. Experimental results using a topology generator that models the Internet show that it performs significantly better than communication-ignorant schedulers.

acm sigplan symposium on principles and practice of parallel programming | 2003

Phoenix: a parallel programming model for accommodating dynamically joining/leaving resources

Kenjiro Taura; Kenji Kaneda; Toshio Endo; Akinori Yonezawa

This paper proposes Phoenix, a programming model for writing parallel and distributed applications that accommodate dynamically joining/leaving compute resources. In the proposed model, nodes involved in an application see a large and fixed virtual node name space. They communicate via messages, whose destinations are specified by virtual node names, rather than names bound to a physical resource. We describe Phoenix API and show how it allows a transparent migration of application states, as well as dynamically joining/leaving nodes as its by-product. We also demonstrate through several application studies that Phoenix model is close enough to regular message passing, thus it is a general programming model that facilitates porting many parallel applications/algorithms to more dynamic environments. Experimental results indicate applications that have a small task migration cost can quickly take advantage of dynamically joining resources using Phoenix. Divide-and-conq! uer algorithms written in Phoenix achieved a good speedup with a large number of nodes across multiple LANs (120 times speedup using 169 CPUs across three LANs). We believe Phoenix provides a useful programming abstraction and platform for emerging parallel applications that must be deployed across multiple LANs and/or shared clusters having dynamically varying resource conditions.

conference on object oriented programming systems languages and applications | 1993

Highly efficient and encapsulated re-use of synchronization code in concurrent object-oriented languages

Satoshi Matsuoka; Kenjiro Taura; Akinori Yonezawa

Re-use of synchronization code in concurrent OOlanguages has been considered difficult due to i7lheriinrice anomaly, which we minimize with our new pre posal. Designed with hi@ practicality in mind, we propose language primitives (plus t,heir implernentat,ion) \rith t.he following characteristics: (1) it allows niultiple synchronz;atzon schentcs-the language schemes for ~~rogramn1zng synchronization-to coexist and bc integrated, (2) re-use of synchronization code is done sirnilarly t,o sequential OO-languages for user familiarity, (3) it offers high degree of encapsulat,ion-even synchronization schemes could be encapsulated in superclasses in many cases, and (4) it can be efficient,ly implemented on conventional MPPs. We demonstrate the effectiveness of our proposal with solutions to the example inheritance anomaly cases from [16]. We also give an overview of the implementat~ion architecture, along wit11 preliminary benchmarks. The proposed language primitives are being incorporated into our ABCL/onAPlOOO running on Fujitsu’s 512-node PIIPP, APlOOO.

acm sigplan symposium on principles and practice of parallel programming | 1999

StackThreads/MP: integrating futures into calling standards

Kenjiro Taura; Kunio Tabata; Akinori Yonezawa

An implementation scheme of fine-grain multithreading that needs no changes to current calling standards for sequential languages and modest extensions to sequential compilers is described. Like previous similar systems, it performs an asynchronous call as if it were an ordinary procedure call, and detaches the callee from the caller when the callee suspends or either of them migrates to another processor. Unlike previous similar systems, it detaches and connects arbitrary frames generated by off-the-shelf sequential compilers obeying calling standards. As a consequence, it requires neither a frontend preprocessor nor a native code generator that has a builtin notion of parallelism. The system practically works with unmodified GNU C compiler (GCC). Desirable extensions to sequential compilers for guaranteeing portability and correctness of the scheme are clarified and claimed modest. Experiments indicate that sequential performance is not sacrificed for practical applications and both sequential and parallel performance are comparable to Cilk[8], whose current implementation requires a fairly sophisticated preprocessor to C. These results show that efficient asynchronous calls (a.k.a. future calls) can be integrated into current calling standard with a very small impact both on sequential performance and compiler engineering.

Blood | 2009

HapMap scanning of novel human minor histocompatibility antigens

Michi Kamei; Yasuhito Nannya; Hiroki Torikai; Takakazu Kawase; Kenjiro Taura; Yoshihiro Inamoto; Taro Takahashi; Makoto Yazaki; Satoko Morishima; Kunio Tsujimura; Koichi Miyamura; Tetsuya Ito; Hajime Togari; Stanley R. Riddell; Yoshihisa Kodera; Yasuo Morishima; Toshitada Takahashi; Kiyotaka Kuzushima; Seishi Ogawa; Yoshiki Akatsuka

Minor histocompatibility antigens (mHags) are molecular targets of allo-immunity associated with hematopoietic stem cell transplantation (HSCT) and involved in graft-versus-host disease, but they also have beneficial antitumor activity. mHags are typically defined by host SNPs that are not shared by the donor and are immunologically recognized by cytotoxic T cells isolated from post-HSCT patients. However, the number of molecularly identified mHags is still too small to allow prospective studies of their clinical importance in transplantation medicine, mostly due to the lack of an efficient method for isolation. Here we show that when combined with conventional immunologic assays, the large data set from the International HapMap Project can be directly used for genetic mapping of novel mHags. Based on the immunologically determined mHag status in HapMap panels, a target mHag locus can be uniquely mapped through whole genome association scanning taking advantage of the unprecedented resolution and power obtained with more than 3 000 000 markers. The feasibility of our approach could be supported by extensive simulations and further confirmed by actually isolating 2 novel mHags as well as 1 previously identified example. The HapMap data set represents an invaluable resource for investigating human variation, with obvious applications in genetic mapping of clinically relevant human traits.

Lecture Notes in Computer Science | 2000

Performance Evaluation of OpenMP Applications with Nested Parallelism

Yoshizumi Tanaka; Kenjiro Taura; Mitsuhisa Sato; Akinori Yonezawa

Many existing OpenMP systems do not sufficiently implement nested parallelism. This is supposedly because nested parallelism is believed to require a significant implementation effort, incur a large overhead, or lack applications. This paper demonstrates Omni/ST, a simple and efficient implementation of OpenMP nested parallelism using StackThreads/MP, which is a fine-grain thread library. Thanks to StackThreads/MP, OpenMP parallel constructs are simply mapped onto thread creation primitives of StackThreads/MP, yet they are efficiently managed with a fixed number of threads in the underlying thread package (e.g., Pthreads). Experimental results on Sun Ultra Enterprise 10000 with up to 60 processors show that overhead imposed by nested parallelism is very small (1-3% in five out of six applications, and 8% for the other), and there is a significant scalability benefit for applications with nested parallelism.

acm sigplan symposium on principles and practice of parallel programming | 1993

An efficient implementation scheme of concurrent object-oriented languages on stock multicomputers

Kenjiro Taura; Satoshi Matsuoka; Akinori Yonezawa

Several novel techniques for efficient implementtion of concurrent object-oriented languages on general purpose, stock multicomputers are presented. These techniques have been developed in implementing our concurrent object-oriented language ABCL on a Fujitsu Laboratorys experimental multicomputer AP1000 consisting of 512 SPARC chips. The propsed intra-node scheduling mechanism reduces the cost of local message passing. The cost of intra-node asynchronous message passing is about 20 SPARC instructions in the bst case, including locality checking, dynamic method lookup, and scheduling. The minimum latency of asynchronous internode message passing is about 9μs, or about 120 instructions, employing the self-dispatching mechanism independently proposed by Eicken et al. A large scale benchmark which involves 9,000,000 message passings shows 440 times speedup on the 512 nodes system compared to the sequential version of the same algorithm. We rely on simple hardware support for message passing and use no specialized architectural supports for object-oriented computing. Thus, we are able to enjoy the benefits of future progress in standard processor technology. Our result shows that concurrent object-oriented languages can be implemented efficiently on conventional multicomputers.

grid computing | 2005

A scalable and efficient self-organizing failure detector for grid applications

Yuuki Horita; Kenjiro Taura; Takashi Chikayama

Failure detection and group membership management are basic building blocks for self-repairing systems in distributed environments, which need to be scalable, reliable, and efficient in practice. As available resources become larger in size and more widely distributed, it is more essential that they can be easily used with a small amount of manual configuration in grid environments, where connectivities between different networks may be limited by firewalls and NATs. In this paper, we present a scalable failure detection protocol that self-organizes in grid environments. Our failure detectors autonomously create dispersed monitoring relationships among participating processes with almost no manual configuration so that each process will be monitored by a small number of other processes, and quickly disseminate notifications along the monitoring relationships when failures are detected. With simulations and real experiments, we showed that our failure detector has a practical scalability, a high reliability, and a good efficiency. The overhead with 313 processes was at most 2-percent even when the heartbeat interval was set to 0.1 second, and accordingly smaller when it was longer.

high performance distributed computing | 2010

File-access patterns of data-intensive workflow applications and their implications to distributed filesystems

Takeshi Shibata; SungJun Choi; Kenjiro Taura

This paper studies five real-world data intensive workflow applications in the fields of natural language processing, astronomy image analysis, and web data analysis. Data intensive workflows are increasingly becoming important applications for cluster and Grid environments. They open new challenges to various components of workflow execution environments including job dispatchers, schedulers, file systems, and file staging tools. The keys to achieving high performance are efficient data sharing among executing hosts and locality-aware scheduling that reduces the amount of data transfer. While much work has been done on scheduling workflows, many of them use synthetic or random workload. As such, their impacts on real workloads are largely unknown. Understanding characteristics of real-world workflow applications is a required step to promote research in this area. To this end, we analyse real-world workflow applications focusing on their file access patterns and summarize their implications to schedulers and file system/staging designs.

Explore More