Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tomohide Shibata is active.

Publication


Featured researches published by Tomohide Shibata.


Journal of Information Processing | 2012

TSUBAKI: An Open Search Engine Infrastructure for Developing Information Access Methodology

Keiji Shinzato; Tomohide Shibata; Daisuke Kawahara; Sadao Kurohashi

Due to the explosive growth in the amount of information in the last decade, it is getting extremely harder to obtain necessary information by conventional information access methods. Hence, creation of drastically new technology is needed. For developing such new technology, search engine infrastructures are required. Although the existing search engine APIs can be regarded as such infrastructures, these APIs have several restrictions such as a limit on the number of API calls. To help the development of new technology, we are running an open search engine infrastructure, TSUBAKI, on a high-performance computing environment. In this paper, we describe TSUBAKI infrastructure.


international joint conference on natural language processing | 2005

Automatic slide generation based on discourse structure analysis

Tomohide Shibata; Sadao Kurohashi

In this paper, we describe a method of automatically generating summary slides from a text. The slides are generated by itemizing topic/non-topic parts that are extracted from the text based on syntactic/case analysis. The indentations of the items are controlled according to the discourse structure, which is detected by cue phrases, identification of word chain and similarity between two sentences. Our experiments demonstrates generated slides are far easier to read in comparison with original texts.


web intelligence | 2009

Web Information Organization Using Keyword Distillation Based Clustering

Tomohide Shibata; Yasuo Bamba; Keiji Shinzato; Sadao Kurohashi

This paper describes a system that conducts search result clustering for several thousands of Web pages, and elaborates cluster labels through keyword distillation. Keyword distillation is a method that properly handles spelling variations, transliterations, synonyms, inclusion relations and word ambiguity, using linguistic resources and contexts of a users query. The system provides a clustering result from 1,000 pages in less than one minute by taking advantage of a search engine infrastructure and grid computing environment. Experimental results show that the system correctly merged synonymous keywords and is useful for finding topics hidden in the lower-ranked pages in a search result.


Archive | 2016

Chat-Like Conversational System Based on Selection of Reply Generating Module with Reinforcement Learning

Tomohide Shibata; Yusuke Egashira; Sadao Kurohashi

This paper presents a chat-like conversational system, and that generates a reply by selecting an appropriate reply generating module. Such modules consist in selecting a sentence from an article of Web news, retrieving a definition sentence in Wikipedia, question-answering, and so on. A dialogue strategy corresponds to which reply generating module should be chosen according to a user input and the dialogue history, and is learned in the MDP framework. User evaluations showed that our system could learn an appropriate dialogue strategy, and perform natural dialogues.


meeting of the association for computational linguistics | 2006

Unsupervised Topic Identification by Integrating Linguistic and Visual Information Based on Hidden Markov Models

Tomohide Shibata; Sadao Kurohashi

This paper presents an unsupervised topic identification method integrating linguistic and visual information based on Hidden Markov Models (HMMs). We employ HMMs for topic identification, wherein a state corresponds to a topic and various features including linguistic, visual and audio information are observed. Our experiments on two kinds of cooking TV programs show the effectiveness of our proposed method.


workshop on information credibility on the web | 2008

Extracting the author of web pages

Yoshikiyo Kato; Daisuke Kawahara; Kentaro Inui; Sadao Kurohashi; Tomohide Shibata

In this paper, we define the problem of identifying the author of a Web page as a sub-problem of identifying the information sender configuration of a Web page. We propose a method that extracts the author name candidates from a Web page based on linguistic features, and rank the candidates based on local features such as distance from the main content. The evaluation shows that we can achieve more than 75% precision when evaluated with candidates ranked within top five.


international conference on knowledge-based and intelligent information and engineering systems | 2003

Structural Analysis of Instruction Utterances

Tomohide Shibata; Daisuke Kawahara; Masashi Okamoto; Sadao Kurohashi; Toyoaki Nishida

Toward designing a system which teaches various works in- teractively and visually, this paper proposes a method of analyzing in- struction utterances. One of the biggest problem in dealing with spoken language is ellipsis/anaphor resolution. We resolve it using a domain- specific case frame dictionary constructed automatically from a large amount of texts. Then, we attach utterance-type to distinguish actions from notes, tips, etc. Based on the attached type, we analyze discourse structure of utterances and detect a unit of actions.


web intelligence | 2009

Identifying Information Sender Configuration of Web Pages

Yoshikiyo Kato; Daisuke Kawahara; Kentaro Inui; Sadao Kurohashi; Tomohide Shibata

The source of a piece of information is a crucial element to consider when judging the credibility of that information. In this paper, we address the task of identifying the information source which is cast as a problem of identifying the {\em information sender configuration (ISC)} of a Web page. An information sender of a Web page is an entity which is involved in the publication of the information on the page. An ISC of a Web page describes the information senders of the page and the relationship among them. Information sender extraction is thus a subtask of identifying ISC, and we present a method for extracting information senders from Web pages and offer preliminary evaluation. The ISC provides a basis for deeper analysis of information on the Web.


acm multimedia | 2007

Automatic object model acquisition and object recognition by integrating linguistic and visual information

Tomohide Shibata; Norio Kato; Sadao Kurohashi

In order to make the best use of multimedia contents effectively, the crucial point is the structural analysis of the contents, in which several media processing techniques, including image, audio and text analyses, should be integrated. To understand utterances in videos in accordance with the scene, it is essential to recognize what object appears in the videos. In this paper, we focus on Japanese cooking TV videos, and propose a method for acquiring object models of foods in an unsupervised manner and performing object recognition based on the acquired object models. First, a topic of each video segment is identified based on HMMs to obtain good examples for the object model acquisition. After that, close-up images are extracted from image sequences, and an attention region on the close-up image is determined. Then, an important word is extracted as a keyword from utterances around the close-up image, and is made correspond to the close-up image. By collecting a set of close-up image and keyword from a large amount of videos, object models are acquired. After acquiring the object models, object recognition is performed based on the acquired object models and linguistic information. We conducted experiments on two kinds of cooking TV programs. We acquired the object models of around 100 foods with an accuracy 77.8%. The F measure of object recognition was 0.727.


Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications | 2009

Bottom-up Named Entity Recognition using Two-stage Machine Learning Method

Hirotaka Funayama; Tomohide Shibata; Sadao Kurohashi

This paper proposes Japanese bottom-up named entity recognition using a two-stage machine learning method. Most work has formalized Named Entity Recognition as a sequential labeling problem, in which only local information is utilized for the label estimation, and thus a long named entity consisting of several morphemes tends to be wrongly recognized. Our proposed method regards a compound noun (chunk) as a labeling unit, and first estimates the labels of all the chunks in a phrasal unit (bunsetsu) using a machine learning method. Then, the best label assignment in the bunsetsu is determined from bottom up as the CKY parsing algorithm using a machine learning method. We conducted an experimental on CRL NE data, and achieved an F measure of 89.79, which is higher than previous work.

Collaboration


Dive into the Tomohide Shibata's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yoshikiyo Kato

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar

Katsuhito Sudoh

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Masaaki Nagata

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge