Masato Oguchi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Masato Oguchi is active.

Explore More

Publication

Featured researches published by Masato Oguchi.

high performance distributed computing | 1996

A study of caching proxy mechanisms realized on wide area distributed networks

Masato Oguchi; Kinji Ono

The information retrieval systems on a wide area distributed network, such as the World-Wide Web (WWW), become popular among the extremely large number of users. The caching proxy has an important role in these systems for improving their accessibility and serviceability. The caching proxy mechanism is discussed in this paper. First, the role and structure of the caching proxy is explained, and two major problems of existing systems are pointed out. Our solution to overcome these problems is proposed next. Changing a file size by controlling a quality level of cached multimedia data is proposed as a measure to overcome one problem. As a solution to the other problem, making a cluster among neighboring caching proxies, using hyperlink information, is proposed. Finally, an improved caching proxy mechanism based upon these ideas is shown.

high performance distributed computing | 1998

Optimizing protocol parameters to large scale PC cluster and evaluation of its effectiveness with parallel data mining

Masato Oguchi; Takahiko Shintani; Takayuki Tamura; Masaru Kitsuregawa

PC clusters have been studied intensively for next-generation large scale parallel computers. ATM technology is a strong candidate as a de facto standard of high speed communication networks. Therefore an ATM connected PC cluster is a very promising platform from the cost/performance point of view, as a future high performance computing environment. An ATM connected PC cluster consisting of 100 PCs is reported, and characteristics of a transport layer protocol for the PC cluster are evaluated. Point-to-point communication performance is measured and discussed when a TCP window size parameter is changed. Retransmission caused by cell loss at the ATM switch is analyzed, and parameters of the retransmission mechanism suitable for parallel processing on the large scale PC cluster are clarified. From the viewpoint of applications, data intensive applications such as data mining and ad-hoc query processing in databases are considered to be very important for massively parallel processors, in addition to conventional scientific calculations. Thus, investigating the feasibility of such applications on an ATM connected PC cluster is quite meaningful. Parallel data mining is implemented and evaluated on the cluster. The default TCP protocol cannot provide good performance, since a lot of collisions happen during all-to-all multicasting executed on the large scale PC cluster. Using TCP parameters according to the proposed optimization, sufficient performance improvement is achieved for parallel data mining on 100 PCs.

high performance distributed computing | 1995

A proposal for a DSM architecture suitable for a widely distributed environment and its evaluation

Masato Oguchi; Hitoshi Aida; Tadao Saito

To realize functionally distributed computing in a wide area distributed environment, distributed shared memory (DSM) is an attractive option due to the simplicity and flexibility in software programming. DSM has mainly been studied in a local environment. In a widely distributed environment, the latency of communication impacts system performance, even if a high bandwidth network is used. DSM models in a widely distributed environment are discussed and evaluated in this paper. First, two existing DSM models are examined: shared virtual memory and replicated shared memory. Next, an innovative replicated shared memory model, which uses internal machine memory, is proposed. A prototype of this model using multi-thread programming was implemented on multi-CPU SPARCstations. These DSM models are compared with SCRAMNet, whose mechanism is based on replicated shared memory. Results from this evaluation show the superiority of the replicated shared memory compared to shared virtual memory when the length of the network is large. While replicated shared memory using external memory is influenced by the ratio of local and global accesses, replicated shared memory using internal machine memory is suitable for a wide variety of cases. The replicated shared memory model is considered to be suitable particularly for applications which impose real time operation in a widely distributed environment, since some latency hiding techniques such as context switching or data prefetching are not effective for real time demands.

database and expert systems applications | 2002

Run-Time Load Balancing System on SAN-connected PC Cluster for Dynamic Injection of CPU and Disk Resource - A Case Study of Data Mining Application

Kazuo Goda; Takayuki Tamura; Masato Oguchi; Masaru Kitsuregawa

PC cluster system is an attractive platform for data-intensive applications. But the conventional shared-nothing system has a limit on load balancing performance and it is difficult to change the number of nodes and disks dynamically during execution. In this paper, we develop dynamic resource injection, where the system can inject CPU power and expand I/O bandwidth by adding nodes and disks dynamically in the SAN(Storage Area Network)-connected PC cluster. Our experiments with data mining application confirm its effectiveness. We show the advantages of combining PC cluster with SAN.

ieee international conference on high performance computing data and analytics | 1997

Characteristics of a Parallel Data Mining Application Implemented on an ATM Connected PC Cluster

Masato Oguchi; Takahiko Shintani; Takayuki Tamura; Masaru Kitsuregawa

Until recently, workstations were overwhelmingly superior to personal computers in terms of performance. However, recent PC technology has dramatically increased its CPU, main memory, and cache memory performance. Therefore massively parallel computer systems are moving away from proprietary components such as CPU, disks, etc. to commodity parts.

international conference on communications | 2001

Data mining on PC cluster connected with storage area network: its preliminary experimental results

Masato Oguchi; Masaru Kitsuregawa

Personal computer/workstation (PC/WS) clusters have become a hot research topic in the field of parallel and distributed computing. They are considered to play an important role as a large scale computer system, such as large server sites and/or high performance parallel computers, because of their good scalability and cost performance ratio. In the viewpoint of applications, data intensive applications such as data mining and ad-hoc query processing in databases are considered very important for massively parallel processors, in addition to the conventional scientific calculation. Thus, investigating the feasibility of such applications on a PC cluster is meaningful. A PC cluster connected with a storage area network (SAN) is built and evaluated. For disk-to-disk copy operation, SAN clusters are much better than LAN clusters. A data mining application is implemented on the cluster. This application requires iterative scans of shared disks, which degrade the execution performance due to I/O-bottleneck. In order to resolve the problem, a dynamic data copy method is proposed and evaluated. This method prevents the performance degradation caused by shared disk bottleneck in SAN clusters.

european conference on parallel processing | 1999

Performance Analysis for Parallel Generalized Association Rule Mining on a Large Scale PC Cluster

Takahiko Shintani; Masato Oguchi; Masaru Kitsuregawa

One of the most important problems in data mining is discovery of association rules in large database. We had proposed parallel algorithms for mining generalized association rules with classification hierarchy. In this paper, we implemented the proposed algorithms on a large scale PC cluster which consists of one hundred PCs interconnected by an ATM switch, and analyzed the performance of our algorithms using a large amount of transaction dataset. Performance evaluations show our parallel algorithms are effective for handling skew for such large scale parallel systems.

Electronics and Communications in Japan Part I-communications | 1999

Implementation of parallel data mining on an ATM-connected PC cluster and performance analysis of TCP retransmission mechanisms

Masato Oguchi; Takayuki Tamura; Takahiko Shintani; Masaru Kitsuregawa

A recent tendency in parallel computer design has been to use general-purpose components for system configuration elements such as CPUs, disks, and memories, which used to be specially developed. Although the connection network between the processors has been specially developed, it is now possible to configure a large-scale PC cluster with good performance at low cost by making use of an ATM network as a processor connection network because of the development and cost reduction of ATM network technologies in the communication field. In this paper, a large-scale PC cluster is constructed by connecting 100 personal computers by means of a general-purpose ATM network. Applications to parallel data mining are evaluated and discussed. In particular, an analysis is carried out with a focus on the effect of TCP retransmission with cell discarding of the ATM switch on the performance. The parameter setting of a retransmission mechanism suitable for the parallel processing in the cluster is found. Further, by developing a method for setting the retransmission spacing parameters to random values for each node, it is shown that a further improvement is possible.

international parallel and distributed processing symposium | 2000

Parallel Data Mining on ATM-Connected PC Cluster and Optimization of Its Execution Environments

Masato Oguchi; Masaru Kitsuregawa

In this paper, we have constructed a large scale ATM-connected PC cluster consists of 100 PCs, implemented a data mining application, and optimized its execution environment. Default parameters of TCP retransmission mechanism cannot pro vide good performance for data mining application, since a lot of collisions occur in the case of all-to-all multicasting in the large scale PC cluster. Using a TCP retransmission parameters according to the proposed parameter optimization, reasonably good performance improvement is achiev ed for parallel data mining on 100 PCs.Association rule mining, one of the best-known problems in data mining, differs from conventional scientific calculations in its usage of main memory. We have investigated the feasibility of using available memory on remote nodes as a swap area when working nodes need to swap out their real memory contents. According to the experimental results on our PC cluster, the proposed method is expected to be considerably better than using hard disks as a swapping device.

international conference on supercomputing | 1999

Dynamic remote memory acquisition for parallel data mining on ATM-connected PC cluster

Masato Oguchi; Masaru Kitsuregawa

Personal computer/Workstation (PC/WS) clusters are prodsing candidates for future high performance computers, because of their good scalability and cost performance ratio. Data intensive applications, such as data mining and data warehousing, have become very important applications for high performance computing. We previously developed a large scale PC cluster connected with ATM, and implemented several database applications, including parallel data mining, to evaluate their performance and the feasibility of such applications using PC clusters. Association rule mining, one of the best-known problems in data mining, differs from conventional scientific calculations in its usage of main memory. It allocates many small data areas in main memory, and the number of those areas suddenly grows enormously during execution, Thus, the requirement for memory space changes dynamically and becomes extremely large. As a result, the contents of memory must be swapped out if the requirement exceeds the real memory size. However, because the size of each data area is rather small and the elements aTe accessed almost at random, swapping out to a storage device must degrade the performance severely in this case. We are investigating the feasibility of using available memory on idle nodes as a swap area when working nodes need to swap out their real memory contents during the execution of parallel data mining on PC clusters. In many cases, idle nodes are expected to exist in large clusters. In this paper, we report our experiments in which nodes executing applications acquire extra memory dynamically from several available idle nodes through a high-speed network ATM in our pilot system. The experimental results on our PC cluster show that the proposed method is expected to be considerably better than using hard disks as a swapping device. Moreover, a method using a distant node’s memory with remote update operations, which is expected to prevent a thrashing problem, is proposed and evaluated. p,ymission to make digital or hard COMICS of all or Part Of this work fo’ Persona, or classroom use is granted without fee provided that copies are not made oT distributed for profit Or eommerciak advantage and “la’ copies bear this notice and the fill citation on the first Page. To COPY othcnvisc, to republish, to post on s~-~ers or to redistribute to list% reqlrires prior specific permission and/or a fee. 1CS ‘99 Rhodes Greece copyright ACM 1999 I-581 13-164-x/99/06...

Explore More