Keiji Shinzato | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Keiji Shinzato is active.

Explore More

Publication

Featured researches published by Keiji Shinzato.

Journal of Information Processing | 2012

TSUBAKI: An Open Search Engine Infrastructure for Developing Information Access Methodology

Keiji Shinzato; Tomohide Shibata; Daisuke Kawahara; Sadao Kurohashi

Due to the explosive growth in the amount of information in the last decade, it is getting extremely harder to obtain necessary information by conventional information access methods. Hence, creation of drastically new technology is needed. For developing such new technology, search engine infrastructures are required. Although the existing search engine APIs can be regarded as such infrastructures, these APIs have several restrictions such as a limit on the number of API calls. To help the development of new technology, we are running an open search engine infrastructure, TSUBAKI, on a high-performance computing environment. In this paper, we describe TSUBAKI infrastructure.

web intelligence | 2009

Web Information Organization Using Keyword Distillation Based Clustering

Tomohide Shibata; Yasuo Bamba; Keiji Shinzato; Sadao Kurohashi

This paper describes a system that conducts search result clustering for several thousands of Web pages, and elaborates cluster labels through keyword distillation. Keyword distillation is a method that properly handles spelling variations, transliterations, synonyms, inclusion relations and word ambiguity, using linguistic resources and contexts of a users query. The system provides a clustering result from 1,000 pages in less than one minute by taking advantage of a search engine infrastructure and grid computing environment. Experimental results show that the system correctly merged synonymous keywords and is useful for finding topics hidden in the lower-ranked pages in a search result.

international universal communication symposium | 2009

Development of a large-scale web crawler and search engine infrastructure

Susumu Akamine; Yoshikiyo Kato; Daisuke Kawahara; Keiji Shinzato; Kentaro Inui; Sadao Kurohashi; Yutaka Kidawara

This paper reports the ongoing development of a large-scale Web crawler and search engine infrastructure at National Institute of Information and Communications Technology. This infrastructure has the following characteristics: (1) It collects one billion Japanese Web pages while keeping them up-to-date. (2) It selects 100 million pages from among the collected pages and converts them into a standard data format to store the results of morphological analysis, dependency parsing, and synonym augmentation. (3) The selected set of pages is searchable and accessible to the users. (4) The scalability of the system is achieved by using a large-scale cluster machine for distributed data processing.

international joint conference on natural language processing | 2008