Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Takehito Utsuro is active.

Publication


Featured researches published by Takehito Utsuro.


meeting of the association for computational linguistics | 1993

STRUCTURAL MATCHING OF PARALLEL TEXTS

Yuji Matsumoto; Takehito Utsuro; Hiroyuki Ishimoto

This paper describes a method for finding structural matching between parallel sentences of two languages, (such as Japanese and English). Parallel sentences are analyzed based on unification grammars, and structural matching is performed by making use of a similarity measure of word pairs in the two languages. Syntactic ambiguities are resolved simultaneously in the matching process. The results serve as a useful source for extracting linguistic and lexical knowledge.


international conference on computational linguistics | 1994

Bilingual text, matching using bilingual dictionary and statistics

Takehito Utsuro; Hiroshi Ikeda; Masaya Yamane; Yuji Matsumoto; Makoto Nagao

This paper describes a unified framework for bilingual text matching by combining existing hand-written bilingual dictionaries and statistical techniques. The process of bilingual text matching consists of two major steps: sentence alignment and structural matching of bilingual sentences. Statistical techniques are applied to estimate word correspondences not included in bilingual dictionaries. Estimated word correspondences are useful for improving both sentence alignment and structural matching.


Life-like characters | 2004

Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents

Shinichi Kawamoto; Hiroshi Shimodaira; Tsuneo Nitta; Takuya Nishimoto; Satoshi Nakamura; Katsunobu Itou; Shigeo Morishima; Tatsuo Yotsukura; Atsuhiko Kai; Akinobu Lee; Yoichi Yamashita; Takao Kobayashi; Keiichi Tokuda; Keikichi Hirose; Nobuaki Minematsu; Atsushi Yamada; Yasuharu Den; Takehito Utsuro; Shigeki Sagayama

Galatea is a software toolkit to develop a human-like spoken dialog agent. In order to easily integrate the modules of different characteristics including speech recognizer, speech synthesizer, facial animation synthesizer, and dialog controller, each module is modeled as a virtual machine having a simple common interface and connected to each other through a broker (communication manager). Galatea employs model-based speech and facial animation synthesizers whose model parameters are adapted easily to those for an existing person if his or her training data is given. The software toolkit that runs on both UNIX/Linux and Windows operating systems will be publicly available in the middle of 2003 [7, 6].


international conference on computational linguistics | 2000

Named entity chunking techniques in supervised learning for Japanese named entity recognition

Manabu Sassano; Takehito Utsuro

This paper focuses on the issue of named entity chunking in Japanese named entity recognition. We apply the supervised decision list learning method to Japanese named entity recognition. We also investigate and incorporate several named-entity noun phrase chunking techniques and experimentally evaluate and compare their performance. In addition, we propose a method for incorporating richer contextual information as well as patterns of constituent morphemes within a named entity, which have not been considered in previous research, and show that the proposed method outperforms these previous approaches.


international conference on computational linguistics | 1994

Thesaurus-based efficient example retrieval by generating retrieval queries from similarities

Takehito Utsuro; Kiyotaka Uchimoto; Mitsutaka Matsumoto; Makoto Nagao

In example-based NLP, the problem of computational cost of example retrieval is severe, since the retrieval time increases in proportion to the number of examples in the database. This paper proposes a novel example retrieval method for avoiding full retrieval of examples. The proposed method has the following three features, 1) it generates retrieval queries from similarities, 2) efficient example retrieval through the tree structure of a thesaurus, 3) binary search along subsumption ordering of retrieval queries. Example retrieval time drastically decreases with the method.


international conference on computational linguistics | 1992

Lexical knowledge acquisition from bilingual corpora

Takehito Utsuro; Yuji Matsumoto; Makoto Nagao

For practical research in natural language processing, it is indispensable to develop a large scale semantic dictionary for computers. It is especially important to improve the techniques for compiling semantic dictionaries from natural language texts such as those in existing human dictionaries or in large corpora. However, there are at least two difficulties in analyzing existing texts: the problem of syntactic ambiguities and the problem of polysemy. Our approach to solve these difficulties is to make use of translation examples in two distinct languages that have quite different syntactic structures and word meanings. The reason we took this approach is that in many cases both syntactic and semantic ambiguities are resolved by comparing analyzed results from both languages. In this paper, we propose a method for resolving the syntactic ambiguities of translation examples of bilingual corpora and a method for acquiring lexical knowledge, such as case frames of verbs and attribute sets of nouns.


international conference on the computer processing of oriental languages | 2006

Compilation of a dictionary of japanese functional expressions with hierarchical organization

Suguru Matsuyoshi; Satoshi Sato; Takehito Utsuro

The Japanese language has a lot of functional expressions, which consist of more than one word and behave like a single functional word. A remarkable characteristic of Japanese functional expressions is that each functional expression has many different surface forms. This paper proposes a methodology for compilation of a dictionary of Japanese functional expressions with hierarchical organization. We use a hierarchy with nine abstraction levels: the root node is a dummy node that governs all entries; a node in the first level is a headword in the dictionary; a leaf node corresponds to a surface form of a functional expression. Two or more lists of functional expressions can be integrated into this hierarchy. This hierarchy also provides a way of systematic generation of all different surface forms. We have compiled the dictionary with 292 headwords and 13,958 surface forms, which covers almost all of major functional expressions.


International Conference on NLP | 2012

Applying a Burst Model to Detect Bursty Topics in a Topic Model

Yusuke Takahashi; Takehito Utsuro; Masaharu Yoshioka; Noriko Kando; Tomohiro Fukuhara; Hiroshi Nakagawa; Yoji Kiyota

This paper focuses on two types of modeling of information flow in news stream, namely, burst analysis and topic modeling. First, when one wants to detect a kind of topics that are paid much more attention than usual, it is usually necessary for him/her to carefully watch every article in news stream at every moment. In such a situation, it is well known in the field of time series analysis that Kleinberg’s modeling of bursts is quite effective in detecting burst of keywords. Second, topic models such as LDA (latent Dirichlet allocation) are also quite effective in estimating distribution of topics over a document collection such as articles in news stream. However, Kleinberg’s modeling of bursts is usually applied only to bursts of keywords but not to those of topics. Considering this fact, we propose how to apply Kleinberg’s modeling of bursts to topics estimated by a topic model such as LDA and DTM (dynamic topic model).


international acm sigir conference on research and development in information retrieval | 2009

Evaluating effects of machine translation accuracy on cross-lingual patent retrieval

Atsushi Fujii; Masao Utiyama; Mikio Yamamoto; Takehito Utsuro

We organized a machine translation (MT) task at the Seventh NTCIR Workshop. Participating groups were requested to machine translate sentences in patent documents and also search topics for retrieving patent documents across languages. We analyzed the relationship between the accuracy of MT and its effects on the retrieval accuracy.


adversarial information retrieval on the web | 2008

Analysing features of Japanese splogs and characteristics of keywords

Yuuki Sato; Takehito Utsuro; Yoshiaki Murakami; Tomohiro Fukuhara; Hiroshi Nakagawa; Yasuhide Kawada; Noriko Kando

This paper focuses on analyzing (Japanese) splogs based on various characteristics of keywords contained in them. We estimate the behavior of spammers when creating splogs from other sources by analyzing the characteristics of keywords contained in splogs. Since splogs often cause noises in word occurrence statistics in the blogosphere, we assume that we can efficiently (manually) collect splogs by sampling blog homepages containing keywords of a certain type on the date with its most frequent occurrence. We manually examine various features of collected blog homepages regarding whether their text content is excerpt from other sources or not, as well as whether they display affiliate advertisement or out-going links to affiliated sites. Among various informative results, it is important to note that more than half of the collected splogs are created by a very small number of spammers.

Collaboration


Dive into the Takehito Utsuro's collaboration.

Top Co-Authors

Avatar

Tomohiro Fukuhara

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Seiichi Nakagawa

Toyohashi University of Technology

View shared research outputs
Top Co-Authors

Avatar

Noriko Kando

National Institute of Informatics

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Suguru Matsuyoshi

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge