Toyotaro Suzumura | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Toyotaro Suzumura is active.

Explore More

Publication

Featured researches published by Toyotaro Suzumura.

Journal of Grid Computing | 2003

Ninf-g : A reference implementation of rpc-based programming middleware for grid computing

Yoshio Tanaka; Hidemoto Nakada; Satoshi Sekiguchi; Toyotaro Suzumura; Satoshi Matsuoka

GridRPC, which is an RPC mechanism tailored for the Grid, is an attractive programming model for Grid computing. This paper reports on the design and implementation of a GridRPC programming system called Ninf-G. Ninf-G is a reference implementation of the GridRPC API which has been proposed for standardization at the Global Grid Forum. In this paper, we describe the design, implementations and typical usage of Ninf-G. A preliminary performance evaluation in both WAN and LAN environments is also reported. Implemented on top of the Globus Toolkit, Ninf-G provides a simple and easy programming interface based on standard Grid protocols and the API for Grid Computing. The overhead of remote procedure calls in Ninf-G is acceptable in both WAN and LAN environments.

international conference on web services | 2005

Optimizing Web services performance by differential deserialization

Toyotaro Suzumura; Toshiro Takase; Michiaki Tatsubori

Web services technology has emerged as a key infrastructure that enables business entities to interact with each other without any human inventions. In order for the technology to be widely used, especially in any field where a large volume of transactions may be processed, it is highly desirable that the Web services engine should tolerate such environments. In this paper, we present a novel approach for improving Web services performance. We first focus on the fundamental characteristics of the Web services in that the SOAP messages on the wire are mostly generated by machines and have a lot of similarities among the processed messages. By making use of these features and eliminating the redundant processing, we propose a new deserialization mechanism that reuses matching regions from the previously deserialized application objects from earlier messages, and only performs deserialization for a new region that would not be processed before. Through our experiments in this paper, we observed that our approach obtained a 288% performance gain (maximum) by incorporating the differential deserialization into the Axis SOAP engine.

high performance distributed computing | 2012

Highly scalable graph search for the Graph500 benchmark

Koji Ueno; Toyotaro Suzumura

Graph500 is a new benchmark to rank supercomputers with a large-scale graph search problem. We found that the provided reference implementations are not scalable in a large distributed environment. We devised an optimized method based on 2D partitioning and other methods such as communication compression and vertex sorting. Our optimized implementation can handle BFS (Breadth First Search) of a large graph with 236 (68.7 billion vertices) and 240 (1.1 trillion) edges in 10.58 seconds while using 1366 nodes and 16,392 CPU cores. This performance corresponds to 103.9 GE/s. We also studied the performance characteristics of our optimized implementation and reference implementations on a large distributed memory supercomputer with a Fat-Tree-based Infiniband network.

international world wide web conferences | 2005

An adaptive, fast, and safe XML parser based on byte sequences memorization

Toshiro Takase; Hisashi Miyashita; Toyotaro Suzumura; Michiaki Tatsubori

XML (Extensible Markup Language) processing can incur significant runtime overhead in XML-based infrastructural middleware such as Web service application servers. This paper proposes a novel mechanism for efficiently processing similar XML documents. Given a new XML document as a byte sequence, the XML parser proposed in this paper normally avoids syntactic analysis but simply matches the document with previously processed ones, reusing those results. Our parser is adaptive since it partially parses and then remembers XML document fragments that it has not met before. Moreover, it processes safely since its partial parsing correctly checks the well-formedness of documents. Our implementation of the proposed parser complies with the JSR 63 standard of the Java API for XML Processing (JAXP) 1.1 specification. We evaluated Deltarser performance with messages using Google Web services. Comparing to Piccolo (and Apache Xerces), it effectively parses 35% (106%) faster in a server-side use-case scenario, and 73% (126%) faster in a client-side use-case scenario.

international conference on cloud computing | 2011

Elastic Stream Computing with Clouds

Atsushi Ishii; Toyotaro Suzumura

Stream computing, also known as data stream processing, has emerged as a new processing paradigm that processes incoming data streams from tremendous numbers of sensors in a real-time fashion. Data stream applications must have low latency even when the incoming data rate fluctuates wildly. This is almost impossible with a local stream computing environment because its computational resources are finite. To address this kind of problem, we have devised a method and an architecture that transfers data stream processing to a Cloud environment as required in response to the changes of the data rate in the input data stream. Since a trade-off exists between applications latency and the economic costs when using the Cloud environment, we treat it as an optimization problem that minimizes the economic cost of using the Cloud. We implemented a prototype system using Amazon EC2 and an IBM System S stream computing system to evaluate the effectiveness of our approach. Our experimental results show that our approach reduces the costs by 80% while keeping the applications response latency low.

ieee international symposium on workload characterization | 2011

Performance characteristics of Graph500 on large-scale distributed environment

Toyotaro Suzumura; Koji Ueno; Hitoshi Sato; Katsuki Fujisawa; Satoshi Matsuoka

Graph500 is a new benchmark for supercomputers based on large-scale graph analysis, which is becoming an important form of analysis in many real-world applications. Graph algorithms run well on supercomputers with shared memory. For the Linpack-based supercomputer rankings, TOP500 reports that heterogeneous and distributed-memory super-computers with large numbers of GPGPUs are becoming dominant. However, the performance characteristics of large-scale graph analysis benchmarks such as Graph500 on distributed-memory supercomputers have so far received little study. This is the first report of a performance evaluation and analysis for Graph500 on a commodity-processor-based distributed-memory supercomputer. We found that the reference implementation “replicated-csr” based on distributed level-synchronized breadth-first search solves a large free graph problem with 231 vertices and 235 edges (approximately 2.15 billon vertices and 34.3 billion edges) in 3.09 seconds with 128 nodes and 3,072 cores. This equates to 11 giga-edges traversed per second. We describe the algorithms and implementations of the reference implementations of Graph500, and analyze the performance characteristics with varying graph sizes and numbers of computer nodes and different implementations. Our results will also contribute to the development of optimized algorithms for the coming exascale machines.

Proceedings of the 2012 ACM SIGPLAN X10 Workshop on | 2012

X10-based massive parallel large-scale traffic flow simulation

Toyotaro Suzumura; Sei Kato; Takashi Imamichi; Mikio Takeuchi; Hiroki Kanezashi; Tsuyoshi Idé; Tamiya Onodera

Optimizing city transportation for smarter cities can have a major impact on the quality of life in urban areas in terms of economic merits and low environmental load. In many cities of the world, transport authorities are facing common challenges such as worsening congestion, insufficient transport infrastructure, increasing carbon emissions, and growing customer needs. To tackle these challenges, it is highly necessary to have fine-grained and large-scale agent simulation for designing smarter cities. In this paper we propose a large-scale traffic simulation platform built on top of X10, a new distributed and parallel programming language. Experimental results demonstrate linear scalable performance in simulating large-scale traffic flows of the national Japanese road network and a hundred of cities of the world using thousands of CPU cores.

ieee international conference on high performance computing, data, and analytics | 2013

Parallel distributed breadth first search on GPU

Koji Ueno; Toyotaro Suzumura

In this paper we propose a highly optimized parallel and distributed BFS on GPU for Graph500 benchmark. We evaluate the performance of our implementation using TSUBAME2.0 supercomputer. We achieve 317 GTEPS (billion traversed edges per second) with scale 35 (a large graph with 34.4 billion vertices and 550 billion edges) using 1366 nodes and 4096 GPUs. With this score, TSUBAME2.0 supercomputer is ranked fourth in the ranking list announced in June 2012. We analyze the performance of our implementation and the result shows that inter-node communication limits the performance of our GPU implementation. We also propose SIMD Variable-Length Quantity (VLQ) encoding for compression of communication data with GPU.

conference on high performance computing (supercomputing) | 2001

A Jini-Based Computing Portal System

Toyotaro Suzumura; Satoshi Matsuoka; Hidemoto Nakada

JiPANG (A Jini-based Portal Augmenting Grids) is a portal system and a toolkit which provides uniform access interface layer to a variety of Grid systems, and is built on top of Jini distributed object technology. JiPANG performs uniform higher-level management of the computing services and resources being managed by individual Grid systems such as Ninf, NetSolve, Globus, etc. In order to give the user a uniform interface to the Grids JiPANG provides a set of simple Java APIs called the JiPANG Toolkits, and furthermore, allows the user to interact with Grid systems, again in a uniform way, using the JiPANG Browser application. With JiPANG, users need not install any client packages before-hand to interact with Grid systems, nor be concerned about updating to the latest version. Such uniform, transparent services available in a ubiquitous manner we believe is essential for the success of Grid as a viable computing platform for the next generation.

international world wide web conferences | 2009

HTML templates that fly: a template engine approach to automated offloading from server to client

Michiaki Tatsubori; Toyotaro Suzumura

Web applications often use HTML templates to separate the webpage presentation from its underlying business logic and objects. This is now the de facto standard programming model for Web application development. This paper proposes a novel implementation for existing server-side template engines, FlyingTemplate, for (a) reduced bandwidth consumption in Web application servers, and (b) off-loading HTML generation tasks to Web clients. Instead of producing a fully-generated HTML page, the proposed template engine produces a skeletal script which includes only the dynamic values of the template parameters and the bootstrap code that runs on a Web browser at the client side. It retrieves a client-side template engine and the payload templates separately. With the goals of efficiency, implementation transparency, security, and standards compliance in mind, we developed FlyingTemplate with two design principles: effective browser cache usage, and reasonable compromises which restrict the template usage patterns and relax the security policies slightly but in a controllable way. This approach allows typical template-based Web applications to run effectively with FlyingTemplate. As an experiment, we tested the SPECweb2005 banking application using FlyingTemplate without any other modifications and saw throughput improvements from 1.6x to 2.0x in its best mode. In addition, FlyingTemplate can enforce compliance with a simple security policy, thus addressing the security problems of client-server partitioning in the Web environment.

Explore More