Munenori Kai
Seikei University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Munenori Kai.
pacific rim conference on communications, computers and signal processing | 2009
Takahiro Kondoh; Fumiaki Kato; Munenori Kai
The authors have been developing a strong migration mobile agent system in Java. Using the system, we are developing the platform of an autonomic distributed processing system, called AgentSphere. In this research, a mechanism so that an agent can create a backup of itself which includes data in the middle of execution is implemented. In order to use this mechanism, a user describes backup commands in the agents code. The backed-up agent is sent into other AgentSphere. And when an original agent stops according to an unexpected situation, the backed-up agent will start its activity instead of original one in order to resume its processing. Moreover, the method to insert backup commands automatically in suitable positions of an agents code is proposed. Furthermore, this paper also describes the implementation of a scheduler which performs initial distribution of agents and the communication functions between agents in AgentSphere.
pacific rim conference on communications, computers and signal processing | 2007
Yasuki Sakurai; Masatoshi Takubo; Fumiaki Kato; Munenori Kai
In order to keep high efficiency, high reliability, and fault tolerance in a distributed processing system, we have been developing an autonomic distributed processing system by using a Java-based mobile agent system. Since we used a mobile agent system based on a weak migration mobility in our system, it was difficult for users to write an agent program freely which can resume its process on destination computer from the point at which it stopped just before moving. In this paper, we propose a new source code transformation method for any strong migration mobile agent code. The method can transform the source code, which is easy to write for users, into the code to be executed on the ordinary JavaVMs (Java Virtual Machines).
pacific rim conference on communications, computers and signal processing | 2011
Masahiko Utsunomiya; Ryuji Shioda; Munenori Kai
The task scheduling problems belong to the class of NP complete or strong NP complete combinatorial optimization problem. Taking account of communication overhead into task graphs makes these problems more difficult to solve by depth first search algorithm based on branch and bound method because the search space becomes much larger. In this paper, we propose an extended version of original priority levels used in Critical Path Method. The calculation of more accurate priority level of each task taking account of communication time among tasks requires by itself to solve another combinatorial optimization problem. So, our proposed new heuristic calculation method of each accurate priority level is completed in practically short time. By using our newly proposed priority level for both heuristics and bound operations in the search, it is shown that the scheduling time become shorter than the branch and bound search with the original priority level.
pacific rim conference on communications, computers and signal processing | 1999
Munenori Kai; M. Shimada
We propose task scheduling algorithms taking account of communication overhead. The conventional algorithms based on search cannot find solutions in practical time due to combinatorial explosion. Our algorithms can find better solutions in practical time than any heuristic algorithms because they search only the search space where better solutions are likely to exist using good heuristics. The first solution candidate that the grouping priority levels (GP) method finds is the same as the solution given by the heuristic algorithm with no search. So it will find some solutions better or equal to ones given by the heuristic algorithm.
pacific rim conference on communications, computers and signal processing | 2015
Shibuya Tomonori; Munenori Kai
In order to process the task-graphed parallel solution objects in a shortest time, task scheduling is important. Task scheduling is, however, a strong NP-hard combinatorial optimization problem, and therefore it is difficult to obtain optimal solution in a practical period of time with the task scheduling. For problems additionally extended to take inter-processor communication delay into consideration, it will be more time-consuming to obtain the optimal solution. For these problems, the authors perform parallelized optimal solution search based on branch and bound method. For the sake of effective and high-speed optimal solution search, it is necessary that the lowest limit of the process time from each task for bounding operation to the end of the final task be found out with consideration of the communication time between the tasks. In this study, we developed a method for improving the accuracy of the lower bound and a GUI tool for allowing to manually interfere the search order for the parallel search in order to find out better heuristics useful for the searching.
pacific rim conference on communications, computers and signal processing | 2009
Kazuhiro Saito; Hiroko Midorikawa; Munenori Kai
The Distributed Large Memory system, DLM, was designed to provide a larger size of memory beyond that of local physical memory by using remote memory distributed over cluster nodes. The original DLM adopted a low cost page replacement algorithm which selects an evicted page in address order. In the DLM, the remote page swapping is the most critical in performance. For more efficient swap-out page selection, we propose a new page replacement algorithm which pays attention to swap-in history. The LRU and other algorithms which use the memory access history generate more overhead for user-level software to record memory accesses. On the other hand, using swap-in history generates little costs. According to our performance evaluation, the new algorithm reduces the number of the remote swapping in the maximum by 32% and gains 2.7 times higher performance in real application, Cluster3.0. In this paper, we describe the design of the new page replacement algorithm and evaluate performances in several applications, including NPB and HimenoBmk.
pacific rim conference on communications, computers and signal processing | 2005
Ryusuke Sasaki; Masatoshi Takubo; Munenori Kai
According to dynamic changes of a system, an autonomic distributed processing system can automatically distribute tasks efficiently, and can reissue the lost task. The mobile agent is suitable in order to realize such a system. In this research, the function of the mobile agent system, AgentSpace, was extended and used. The agents to support processing user tasks efficiently have been implemented. For evaluation of our system, SPMD-type multithreaded program of traveling sales-person problem in case of 13 cities was used, and as a result of executing by four heterogeneous PCs, as compared with the time of executing by one PC, the processing time was shortened to about 35.5% on the average.
pacific rim conference on communications computers and signal processing | 1997
Munenori Kai
It is noted that conventional list scheduling algorithms based on the critical path method keep the processor utilization low in the cases where communication overhead is considered. A novel heuristic algorithm which contributes to saving the investment of processor resources and improving the processor utilization is proposed. The scheduling result shows that the algorithm works well. The algorithm gives almost the same schedule length with half the number of processors compared with a conventional scheduling algorithm. This means that the throughput of the system can be raised by executing other applications on the remaining processors.
pacific rim conference on communications, computers and signal processing | 2017
Motoki Hasegawa; Munenori Kai
One of the key methods for the optimization of parallel processing in a program is task scheduling. This intends to minimize implementation time for the entire program by determining schedule for optimally allocating tasks, which are processing units comprising the program, to available processor elements into pieces. Due to complexity of calculation and being a large-scale problem, the optimization problem is extremely challenging to solve with the current algorithms in a practical amount of time. Authors studied the method of partially optimizing task graph as a technique of task scheduling for large-scale problems. This paper reports on scheduling method that supports hierarchical macro for task groups.
pacific rim conference on communications, computers and signal processing | 2017
Hikari Oura; Hiroko Midorikawa; Kenji Kitagawa; Munenori Kai
A remote memory paging system called a distributed large memory (DLM) has been developed, which uses remote-node memories in a cluster, as the main memory extension of a local node. The DLM is available for out-of-core processing, i.e., processing of large-size data that exceeds the main memory capacity in the local node. By using the DLM and memory servers, it is possible to run multi-thread programs written in OpenMP and pthread for large-scale problems on a computation node whose main memory capacity is smaller than the problem data size. A page swap protocol and its implementation are significant factors in the performance of remote memory paging systems. A current version of the DLM has a bottleneck in efficient page swapping because all communication managements between memory servers and the local computation node are allocated to one system thread. This paper proposes two new page swap protocols and implementations by introducing another new system thread to alleviate this situation. They are evaluated by a micro-benchmark, Stream benchmark, and a 7-point stencil computation program. As a result, the proposed protocol improves the performance degradation ratio, i.e., the performance using the DLM divided by the performance using only the local memory, from 57% in the former protocol to 78% in stencil computation, which processes data whose capacity is four times larger than the local memory capacity.