Yoshiki Seo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yoshiki Seo is active.

Explore More

Publication

Featured researches published by Yoshiki Seo.

conference on high performance computing (supercomputing) | 2002

14.9 TFLOPS Three-Dimensional Fluid Simulation for Fusion Science with HPF on the Earth Simulator

Hitoshi Sakagami; Hitoshi Murai; Yoshiki Seo; Mitsuo Yokokawa

We succeeded in getting 14.9 TFLOPS performance when running a plasma simulation code IMPACT-3D parallelized with High Performance Fortran on 512 nodes of the Earth Simulator. The theoretical peak performance of the 512 nodes is 32 TFLOPS, which means 45% of the peak performance was obtained with HPF.IMPACT-3D is an implosion analysis code using TVD scheme, which performs three-dimensional compressible and inviscid Eulerian fluid computation with the explicit 5-point stencil scheme for spatial differentiation and the fractional time step for time integration. The mesh size is 2048x2048x4096, and the third dimension was distributed for the parallelization. The HPF system used in the evaluation is HPF/ES, developed for the Earth Simulator by enhancing NEC HPF/SX V2 mainly in communication scalability. Shift communications were manually tuned to get best performance by using HPF/JA extensions, which was designed to give the users more control over sophisticated parallelization and communication optimizations.

international conference on supercomputing | 1998

Integer sorting on shared-memory vector parallel computers

Kenji Suehiro; Hitoshi Murai; Yoshiki Seo

This paper describes new fast integer sorting methods for single vector and shared-memory parallel vector computers, based on the bucket sort algorithm. Existing vectorization methods for bucket sort have made great efforts to avoid store conflicts of vector scatter operations, and therefore are not so efftcient. The vectorization methods shown in this paper-the retry method, the split vector method and the mask vector method-all actively utilize the nature of the store conflicts to achieve high performance. The parallelization method in this paper uses a feature of shared-memory machines and dynamically changes the partitioning of histogram arrays without any overhead. By combining the retry and the parallelization methods, we got the worlds fastest results for the IS program (Class B) in the NAS Parallel Benchmarks on the NBC

Concurrency and Computation: Practice and Experience | 2002

HPF/JA: extensions of High Performance Fortran for accelerating real‐world applications

Yoshiki Seo; Hidetoshi Iwashita; Hiroshi Ohta; Hitoshi Sakagami

X4. Our methods are also applicable to a wide range of particle simulation programs.

international conference on cloud computing | 2013

Dragonfly: Cloud Assisted Peer-to-Peer Architecture for Multipoint Media Streaming Applications

Erdinc Korpeoglu; Cetin Sahin; Divyakant Agrawal; Amr El Abbadi; Takeo Hosomi; Yoshiki Seo

This paper presents a set of extensions on High Performance Fortran (HPF) to make it more usable for parallelizing real‐world production codes. HPF has been effective for programs that a compiler can automatically optimize efficiently. However, once the compiler cannot, there have been no ways for the users to explicitly parallelize or optimize their programs. In order to resolve the situation, we have developed a set of HPF extensions (HPF/JA) to give the users more control over sophisticated parallelization and communication optimizations. They include parallelization of loops with complicated reductions, asynchronous communication, user‐controllable shadow, and communication pattern reuse for irregular remote data accesses. Preliminary experiments have proved that the extensions are effective at increasing HPFs usability. Copyright

international conference on parallel processing | 1996

Generating realignment-based communication for HPF programs

Tsunehiko Kamachi; Kazuhiro Kusano; Kenji Suehiro; Yoshiki Seo; Masanori Tamura; Shoichi Sakon

Technology trends are not only transforming the hardware landscape of end-user devices but are also dramatically changing the types of software applications that are deployed on these devices. With the maturity of cloud computing during the past few years, users increasingly rely on networked applications that are deployed in the cloud. In particular, new applications will emerge where user interactions will be based on real-time continuous media streams instead of the traditional request-response types of interfaces. Furthermore, many of these applications will be multi-user streaming media based interactions instead of a single user interaction with an application. In this paper, we propose a geographic location-aware, hybrid, scalable cloud assisted peer-to-peer (P2P) architecture to support such applications that targets low administration cost, reduced bandwidth consumption, low latency, low initial investment cost and optimized resource usage. The main objective is to develop an efficient media delivery system that leverages locality. We propose a 3-layer novel architecture that uses at the core the cloud for application management, 2-tier edge cloud for supporting geo-dispersed user groups, and at the lowest level peer-to-peer dynamic overlays for locally clustered user groups. The proposed architecture manages multiple streaming sessions simultaneously and each streaming session is an independent entity. Our experiments on PlanetLab show that the dynamic construction and maintenance of delivering streams at both the user-level P2P overlay and edge cloud are indeed feasible and effective.

Archive | 1994

Static Performance Prediction in PCASE: A Programming Environment for Parallel Supercomputers

Yoshiki Seo; Tsunehiko Kamachi; Yukimitsu Watanabe; Kazuhiro Kusano; Kenji Suehiro; Yukimasa Shiroto

This paper presents methods for generating communication on compiling HPF programs for distributed-memory machines. We introduce the concept of an iteration template corresponding to an iteration space. Our HPF compiler performs the loop iteration mapping through the two-level mapping of the iteration template in the same way as the data mapping is performed in HPF. Making use of this unified mapping model of the data and the loops, communication for nonlocal accesses is handled based on data-realignment between the user-declared alignment and the optimal alignment, which ensures that only local accesses occur inside the loop. This strategy results in effective means of dealing with communication for arrays with undefined mapping, a simple manner for generating communication, and high portability of the HPF compiler. Experimental results on the NEC Cenju-3 distributed-memory machine demonstrate the effectiveness of our approach: the execution time of the compiler-generated program was within 10% of that of the hand-parallelized program.

Scientific Programming | 1997

Kemari: a portable High Performance Fortran system for distributed memory parallel processors

Tsunehiko Kamachi; Andreas Müller; Roland Rühl; Yoshiki Seo; Kenji Suehiro; M. Tamura

This paper presents a performance estimator, a prototype of which is implemented within the parallel programming environment PCASE. The estimation is based on static performance prediction, using not only the design information of target machines but also benchmarking results. Additionally, communication costs are estimated based on a hierarchical memory machine model, which enables users to understand all the underlying communication costs on distributed memory machines for each parallel loop. With this performance suggestion, it is possible to interactively optimize data distribution and appropriately select vectorized or parallelized loops. Moreover, the skeleton profiling method is presented. It makes high speed trace (execution count of each statement) generation possible by deleting statements which do not affect the execution path of a program.

Concurrency and Computation: Practice and Experience | 2002

Implementation and evaluation of HPF/SX V2

Hitoshi Murai; Takuya Araki; Yasuharu Hayashi; Kenji Suehiro; Yoshiki Seo

We have developed a compilation system which extends High Performance Fortran (HPF) in various aspects. We support the parallelization of well-structured problems with loop distribution and alignment directives similar to HPFs data distribution directives. Such directives give both additional control to the user and simplify the compilation process. For the support of unstructured problems, we provide directives for dynamic data distribution through user-defined mappings. The compiler also allows integration of message-passing interface (MPI) primitives. The system is part of a complete programming environment which also comprises a parallel debugger and a performance monitor and analyzer. After an overview of the compiler, we describe the language extensions and related compilation mechanisms in detail. Performance measurements demonstrate the compilers applicability to a variety of application classes.

ieee international conference on high performance computing data and analytics | 2002

Optimization of HPF Programs with Dynamic Recompilation Technique

Takuya Araki; Hitoshi Murai; Tsunehiko Kamachi; Yoshiki Seo

We are developing HPF/SX V2, a High Performance Fortran (HPF) compiler for vector parallel machines. It provides some unique extensions as well as the features of HPF 2.0 and HPF/JA. In particular, this paper describes four of them: (1) the ON directive of HPF 2.0; (2) the REFLECT and LOCAL directives of HPF/JA; (3) vectorization directives; and (4) automatic parallelization. We evaluate these features through some benchmark programs on NEC SX‐5. The results show that each of them achieved a 5–8 times speedup in 8‐CPU parallel execution and the four features are useful for vector parallel execution. We also evaluate the overall performance of HPF/SX V2 by using over 30 well‐known benchmark programs from HPFBench, APR Benchmarks, GENESIS Benchmarks, and NAS Parallel Benchmarks. About half of the programs showed good performance, while the other half suggest weakness of the compiler, especially on its runtimes. It is necessary to improve them to put the compiler to practical use. Copyright

Archive | 1995

PCASE: A Programming Environment for Parallel Supercomputers

Yoshiki Seo; Tsunehiko Kamachi; Kazuhiro Kusano; Yukimitsu Watanabe; Yukimasa Shiroto

Optimizing compilers perform various optimizations in order to exploit the best performance from computer systems. However, some kinds of optimizations cannot be applied if values of variables or system parameters are not known at compilation time. To solve this problem, we designed and implemented a system which collects such information at run time, and dynamically recompiles part of the program based on it. In our system, recompilation and management of runtime information are carried out on processors other than those which execute user programs. Therefore, recompilation cost does not affect the program execution time, unlike other similar systems. The evaluation result shows that quite high speedup can be attained with this method.

Explore More