Mamoru Sugie | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mamoru Sugie is active.

Explore More

Publication

Featured researches published by Mamoru Sugie.

international conference on supercomputing | 1992

Evaluation of the lock mechanism in a snooping cache

Toshiaki Tarui; Takayuki Nakagawa; Noriyasu Ido; Machiko Asaie; Mamoru Sugie

This paper discusses the design concepts of a lock mechanism for a Parallel Inference Machine (the PIM/c prototype) and investigates the performance of the mechanism in detail. Lock operations are extremely frequent on the PIM; however, lock contention rarely occurs during normal memory usage. For this reason, the lock mechanism is designed so as to minimize the lock overhead time in the case of no contention. This is done by using an invalidation lock mechanism, which utilizes the exclusive state of the snooping cache and in which the locked address is not broadcast. Experimental results demonstrate the benefits of the lock mechanism in regions of few lock contentions. They also confirm that, in most cases, the lock mechanism works well on the PIM. However, the mechanism is also found to cause performance degradation when a locked address is accessed by multiple processing elements (PEs) in a tightly-coupled multi-processor (TCMP). This is because shared data such as the flags for inter-PE communication, which are shared by all the PEs, may be accessed by multiple PEs at the same time, thus generating heavy contention. This paper also shows that combining a register-based broadcasting facility with the proposed lock mechanism can solve the above problem.

international conference on parallel processing | 2000

A scalable, cost-effective, and flexible disk system using high-performance embedded-processors

Aki Tomita; Naoki Watanabe; Yoshifumi Takamoto; Shigekazu Inohara; Frederico Buchholz Maciel; Hiroaki Odawara; Mamoru Sugie

As a scalable, cost-effective, and flexible solution for data-intensive systems, we are exploring active-network-storage (ANS), which is an array of ANS disk drives. The ANS drive improves flexibility by using a modular software design; that is, users can specify functions of the ANS drive by loading/unloading the corresponding modules on it. To keep the ANS drive cost-effective, users are allowed to choose whether native code modules or platform-independent Java-bytecode modules are executed on the drive. We forecast that a current high-performance embedded-processor is powerful enough to enable this modular design to be implemented and to provide a scalable, cost-effective, and flexible ANS system. We have confirmed our forecast by conducting an experiment with an ANS drive prototype with a 200 MHz embedded-processor running database sequential scanning and NFS, which are typical off-loaded functions with different characteristics. To evaluate scalability and cost-effectiveness of the ANS system, we estimated the throughput from measurements on our ANS prototype, and we compared it with the throughput that was measured on a 450 MHz Pentium II Xeon server. Our estimation indicates that the scan throughput of the ANS system increases up to 71 MB/s while that of the server saturates at 25 MB/s because of its CPU bottleneck. The NFS read/write throughputs of two ANS drives surpassed the server maximum throughputs.

international conference on parallel processing | 1994

Evaluation of the Cluster Structure on the PIM/C Parallel Inference Machine

Toshiaki Tarui; Machiko Asaie; Noriyasu Ido; Takayuki Nakagawa; Mamoru Sugie

The characteristics of a cluster-structure parallel computer are analyzed and evaluated on the PIM/c parallel inference machine, which consists of eight-processor shared-memory clusters communicating through a processor connected to a network. To avoid communication bottlenecks, the maximum number of processors in a cluster is limited by the ratio of communication operations to program-execution operations. Since this ratio can be as high as 30% on the PIM/c, the network receiving operations should be distributed to processors in the same cluster.

IEEE Transactions on Parallel and Distributed Systems | 1994

A concurrent test architecture for massively parallel computers and its error detection capability

Marius V. A. Hancu; Kazuhiko Iwasaki; Yuji Sato; Mamoru Sugie

Presents new principles for online monitoring in the context of multiprocessors (especially massively parallel processors) and then focuses on the effect of the aliasing probability on the error detection process. In the proposed test architecture, concurrent testing (or online monitoring) at the system level is accomplished by enforcing the run-time testing of the data and control dependences of the algorithm currently being executed on the parallel computer. In order to help in this process, each message contains both source and destination addresses. At each message source, the sequence of destination addresses of the outgoing messages is compressed on a block basis. At the same time, at each destination, the sequence of source addresses of all incoming messages is compressed, also on a block basis. Concurrent compression of the instructions executed by the PEs is also possible. As a result of this procedure, an image of the data dependences and of the control flow of the currently running algorithm is created. This image is compared, at the end of each computational block, with a reference image created at compilation time. The main results of this work are in proposing new principles for the online system-level testing of multiprocessor systems, based on signaturing and monitoring the data dependences together with the control dependences, and in providing an analytical model and analysis for the address compression process used for monitoring the data routing process. >

parallel computing | 1992

Experimental results on the error detection capability of a concurrent test architecture for massively-parallel computers☆

Marius V. A. Hancu; Kazuhiko Iwasaki; Yuji Sato; Mamoru Sugie

Abstract In a previous paper, we introduced a new concurrent testing (or on-line monitoring) architecture for Massively-Parallel Computers. In the proposed test architecture, on-line checks for both control flow and data routing are accomplished by enforcing the run-time test of compressed (signatured) versions of the control and data dependences of the algortihm executed in the parallel computer. This paper focuses on the results of simulation experiments on the error detection of the proposed test architecture as applied to the routing process. Four sets of experiments were executed, with two compressors or signature analyzers (an MISR and an LFSR) and two error models (the 2 m -ary and the Binary Symmetric Channel). Using a randomized routing process and a randomized fault insertion, we have obtained detailed figures for the undetected errors at all crucial detecting points of our proposed detection method: the source, the expected destination and the false destination of the messages. High detection ratios for multiple errors were obtained for compressors of only moderate size, supporting the use of this method in practical applications. The results are independent of the topology of the interconnection network and the detailed routing algorithm.

international test conference | 1991

A CONCURRENT TEST ARCHITECTURE FOR MASSIVELY-PARALLEL COMPUTERS AND ITS ERROR DETECTION CAPABILITY

Marius V. A. Hancu; Kazuhiko Iwasaki; Yuji Sato; Mamoru Sugie

New principles for the on-line system-level test of multiprocessors are proposed, based on signaturing and monitoring data dependences together with control dependences. I n order to help in this process, each data routing message contains both source and destination addresses. At each message source, the destination addresses of the outgoing messages are compressed. At the same time, at each destination, the source addresses of all incoming messages are compressed. Concurrent compression of the instructions executed by the PES is also possible. The resulting signatures are compared at the end of each computational block with reference signatures created at compilation time. An analytical model and an analysis for the address compression process used for the monitoring the data routing process are provided. The aliasing probability for the error detection process is studied, obtaining closedform expressions in the single error case and upper bounds in the multiple error case.

conference on logic programming | 1986

Hardware simulator of reduction-based parallel inference machine: PIM-R

Mamoru Sugie; M. Yoneyama; T. Sakabe; M. Iwasaki; S. Yoshizumi; Moritoshi Aso; Hajime Shimizu; Rikio Onai

A hardware simulator of PIM-R (Reduction-Based Parallel Inference Machine) has been developed. Eight MC68000 single board computers and shared storage operate as the inference modules and the network, respectively. In order to realize high simulation rate, an event-driven method is introduced. ”Queens” program and ”Quicksort” program were executed on the simulator. The results show that a PIM-R architecture can effectivity utilize the parallelism in Prolog/Concurrent Prolog programs.

integrated network management | 2003

VPDC: virtual private data center: a flexible and rapid workload-management system

Mineyoshi Masuda; Yutaka Yoshimura; Toshiaki Tarui; Toru Shonai; Mamoru Sugie

Rapid server allocation implemented on a virtual private data center (VPDC), which is an autonomous server allocation system for a three-tier Web system, has been developed and tested. The test results show that, with this new system, elapsed time for application server allocation is about 20 seconds, and that for database server allocation is within 140 seconds.

integrated network management | 2003

VPDC: Virtual Private Data Center

Mineyoshi Masuda; Yutaka Yoshimura; Toshiaki Tarui; Toni Shonai; Mamoru Sugie

Rapid server allocation implemented on Virtual Private Data Center (VPDC), which is an autonomous server allocation system for a three-tier web system, has been developed and tested. The test results show that with this new system elapsed time for application server allocation is about 20 seconds, and that for database server allocation is 140 seconds.

IEEE Transactions on Magnetics | 1983

A single board bubble memory with perfect nonvolatility and a parallel operation system using these memories

Mamoru Sugie; Takashi Toyooka; H. Aoki; Shigeru Yoshizawa; Yutaka Sugita

A single board bubble memory with perfect nonvolatility and an asynchronous parallel operation system using these memories have been developed. Through the use of a saturable coil and a capacitor, the voltage level of the bubble memory power source is maintained. Consequently the proper shut-down sequence can be observed, even in ease of power failures for either open or short circuits. The coil has a small core section to keep it saturated and to reduce its induction during normal operation. When a short circuit occurs, the coil loses saturation and prevents a discharge from the capacitor. Perfect nonvolatility of the bubble memory has been confirmed by 1000 successful power failure tests. In the extended memory, several boards operate asynchronously in parallel through a simple bus arbiter to increase the data transfer rate, Coincidence of the data stream during read and write operations is achieved by the periodic assignment of each memory board to the main memory for every word. This organization adds flexibility to the capacity and the data transfer rate of the bubble memory.

Explore More