Osamu Tatebe | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Osamu Tatebe is active.

Explore More

Publication

Featured researches published by Osamu Tatebe.

cluster computing and the grid | 2002

Grid Datafarm Architecture for Petascale Data Intensive Computing

Osamu Tatebe; Youhei Morita; Satoshi Matsuoka; Noriyuki Soda; Satoshi Sekiguchi

The Grid Datafarm (Gfarm) architecture is designed for global petascale data-intensive computing. It provides a global parallel filesystem with online petascale storage, scalable I/O bandwidth, and scalable parallel processing, and it can exploit local I/O in a grid of clusters with tens of thousands of nodes. Gfarm parallel I/O APIs and commands provide a single filesystem image and manipulate filesystem metadata consistently. Fault tolerance and load balancing are automatically managed by file duplication or recomputation using a command history log. Preliminary performance evaluation has shown scalable disk I/O and network bandwidth on 64 nodes of the Presto III Athlon cluster. The Gfarm parallel I/O write and read operations has achieved data transfer rates of 1.74 GB/s and 1.97 GB/s, respectively, using 64 cluster nodes. The Gfarm parallel file copy reached 443 MB/s with 23 parallel streams on the Myrinet 2000. The Gfarm architecture is expected to enable petascale data-intensive Grid computing with an I/O bandwidth scales to the TB/s range and scalable computational power.

New Generation Computing | 2010

Gfarm Grid File System

Osamu Tatebe; Kohei Hiraga; Noriyuki Soda

Gfarm Grid file system is a global distributed file system to share data and to support distributed data-intensive computing. It federates local file systems on compute nodes to maximize distributed file I/O bandwidth, and allows to store multiple file replicas in any location to avoid read access concentration of hot files. Data location aware process scheduling improves the file I/O performance of distributed data-intensive computing. This paper discusses the design and implementation of the Gfarm Grid file system, and reports the performance.

Archive | 2005

Gfarm v2: A Grid file system that supports high-performance distributed and parallel data computing

Osamu Tatebe; Satoshi Sekiguchi; Youhei Morita; Satoshi Matsuoka; Noriyuki Soda

Grid Datafarm architecture is designed for facilitating reliable file sharing and high-performance distributed and parallel data computing in a Grid across administrative domains by providing a global virtual file system. Gfarm v2 is an attempt to implement a global virtual file system that supports a complete set of standard POSIX APIs, while still retaining the parallel and distributed data computing feature of Grid Datafarm architecture. This paper discusses the design and implementation of Gfarm v2 that provides a secure, robust, scalable and high-performance global virtual file system.

high performance distributed computing | 2003

Performance analysis of scheduling and replication algorithms on Grid Datafarm architecture for high-energy physics applications

Atsuko Takefusa; Satoshi Matsuoka; Osamu Tatebe; Youhei Morita

Data Grid is a Grid for ubiquitous access and analysis of large-scale data. Because Data Grid is in the early stages of development, the performance of its petabyte-scale models in a realistic data processing setting has not been well investigated. By enhancing our Bricks Grid simulator to accommodated Data Grid scenarios, we investigate and compare the performance of different Data Grid models. These are categorized mainly as either central or tier models; they employ various scheduling and replication strategies under realistic assumptions of job processing for CERN LHC experiments on the Grid Datafarm system. Our results show that the central model is efficient but that the tier model, with its greater resources and its speculative class of background replication policies, are quite effective and achieve higher performance, while each tier is smaller than the central model.

international conference on cluster computing | 2004

GNET-1: gigabit Ethernet network testbed

Yuetsu Kodama; Tomohiro Kudoh; Ryousei Takano; H. Sato; Osamu Tatebe; Satoshi Sekiguchi

GNET-1 is a fully programmable network testbed. It provides functions such as wide area network emulation, network instrumentation, traffic shaping, and traffic generation at gigabit Ethernet wire speeds by programming the core FPGA. GNET-1 is a powerful tool for developing network-aware grid software. It is also a network monitoring and traffic-shaping tool that provides high-performance communication over wide area networks. This work describes several sample uses of GNET-1 and presents its architecture.

cluster computing and the grid | 2012

Workflow Scheduling to Minimize Data Movement Using Multi-constraint Graph Partitioning

Masahiro Tanaka; Osamu Tatebe

Among scheduling algorithms of scientific workflows, the graph partitioning is a technique to minimize data transfer between nodes or clusters. However, when the graph partitioning is simply applied to a complex workflow DAG, tasks in each parallel phase are not always evenly assigned to computation nodes since the graph partitioning algorithm is not aware of edge directions that represent task dependencies. Thus, we propose a new method of task assignment based on Multi-Constraint Graph Partitioning. This method relates the dimension of weight vectors to the rank of a task phase defined by traversing the task graph. Our algorithm is implemented in the Pwrake workflow system and evaluated the performance of the Montage workflow using a computer cluster. The result shows that the file size accessed from remote nodes is reduced from 88% to 14% of the total file size accessed during the workflow and that the elapsed time is reduced by 31%.

Computer Physics Communications | 2011

Building the International Lattice Data Grid

Mark G. Beckett; Paul D. Coddington; Balint Joo; C.M. Maynard; Dirk Pleiter; Osamu Tatebe; T. Yoshié

We present the International Lattice Data Grid (ILDG), a loosely federated grid-of-grids for sharing data from Lattice Quantum Chromodynamics (LQCD) simulations. The ILDG comprises of metadata, file-format and web-service standards, which can be used to wrap regional data-grid interfaces, allowing seamless access to catalogues and data in a diverse set of collaborating regional grids. We discuss the technological underpinnings of the ILDG, primarily the metadata and the middleware, and offer a critique of its various aspects with the hindsight of the design work and the two years of production.

grid computing | 2011

Using the Gfarm File System as a POSIX Compatible Storage Platform for Hadoop MapReduce Applications

Shunsuke Mikami; Kazuki Ohta; Osamu Tatebe

MapReduce is a promising parallel programming model for processing large data sets. Hadoop is an up-and-coming open-source implementation of MapReduce. It uses the Hadoop Distributed File System (HDFS) to store input and output data. Due to a lack of POSIX compatibility, it is difficult for existing software to directly access data stored in HDFS. Therefore, it is not possible to share storage between existing software and MapReduce applications. In order for external applications to process data using MapReduce, we must first import the data, process it, then export the output data into a POSIX compatible file system. This results in a large number of redundant file operations. In order to solve this problem we propose using Gfarm file system instead of HDFS. Gfarm is a POSIX compatible distributed file system and has similar architecture to HDFS. We design and implement of Hadoop-Gfarm plug-in which enables Hadoop MapReduce to access files on Gfarm efficiently. We compared the MapReduce workload performance of HDFS, Gfarm, PVFS and Gluster FS, which are open-source distributed file systems. Our various evaluations show that Gfarm performed just as well as Hadoops native HDFS. In most evaluations, Gfarm performed bettar than twice as well as PVFS and Gluster FS.

cluster computing and the grid | 2006

The PRAGMA Testbed - Building a Multi-Application International Grid

Cindy Zheng; David Abramson; Peter W. Arzberger; Shahaan Ayyub; Colin Enticott; Slavisa Garic; Mason J. Katz; Jae-Hyuck Kwak; Bu-Sung Lee; Philip M. Papadopoulos; Sugree Phatanapherom; Somsak Sriprayoonsakul; Yoshio Tanaka; Yusuke Tanimura; Osamu Tatebe; Putchong Uthayopas

This practices and experience paper describes the coordination, design, implementation, availability, and performance of the Pacific Rim Applications and Grid Middleware Assembly (PRAGMA) Grid Testbed. Applications in high-energy physics, genome annotation, quantum computational chemistry, wildfire simulation, and protein sequence alignment have driven the middleware requirements, and the testbed provides a mechanism for international users to share software beyond the essential, de facto standard Globus core. In this paper, we describe how human factors, resource availability and performance issues have affected the middleware, applications and the testbed design. We also describe how middleware components in grid monitoring, grid accounting, grid Remote Procedure Calls, grid-aware file systems, and grid-based optimization have dealt with some of the major characteristics of our testbed. We also briefly describe a number of mechanisms that we have employed to make software more easily available to testbed administrators.

high performance distributed computing | 2010

Pwrake: a parallel and distributed flexible workflow management tool for wide-area data intensive computing

Masahiro Tanaka; Osamu Tatebe

This paper proposes Pwrake, a parallel and distributed flexible workflow management tool based on Rake, a domain specific language for building applications in the Ruby programming language. Rake is a similar tool to make and ant. It uses a Rakefile that is equivalent to a Makefile in make, but written in Ruby. Due to a flexible and extensible language feature, Rake would be a powerful workflow management language. The Pwrake extends Rake to manage distributed and parallel workflow executions that include remote job submission and management of parallel executions. This paper discusses the design and implementation of the Pwrake, and demonstrates its power of language and extensibility of the system using a practical e-Science data-intensive workflow in astronomical data analysis on the Gfarm file system as a case study. Extending a scheduling algorithm to be aware of file locations, 20% of speed up is observed using 8 nodes (32 cores) in a PC cluster. Using two PC clusters located in different institutions, the file location aware scheduling shows scalable speedup. The extensible Pwrake is a promising workflow management tool even for wide-area data analysis.

Explore More