Takenobu Tokunaga
Tokyo Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Takenobu Tokunaga.
international acm sigir conference on research and development in information retrieval | 1999
Rila Mandala; Takenobu Tokunaga; Hozumi Tanaka
Automatic query expansion has been known to be the most important method in overcoming the word mismatch problem in information retrieval. Thesauri have long been used by many researchers as a tool for query expansion. However only one type of thesaurus has generally been used. In this paper we analyze the characteristics of di erent thesaurus types and propose a method to combine them for query expansion. Experiments using the TREC collection proved the e ectiveness of our method over those using one type of thesaurus.
international acm sigir conference on research and development in information retrieval | 1995
Makoto Iwayama; Takenobu Tokunaga
Text categorization can be viewed asaprocessof catego~ search, in which one or more categories for a testdocument are searchedfor by using given training documents with known categories. In this paper a cluster-based search with a probabilistic clustering algorithm is proposed and evaluated on two data sets. The “efficiency, effectiveness, and noise tolerance of this search strategy were confirmed to be better than those of a full search, a category-based search, and a cluster-based search with nonprobabilistic clustering.
Information Processing and Management | 2000
Rila Mandala; Takenobu Tokunaga; Hozumi Tanaka
This paper proposes a method to improve the performance of information retrieval systems by expanding queries using heterogeneous thesauri. The expansion terms are taken from a hand-crafted thesaurus, co-occurrence-based automatically constructed thesaurus, and predicate-argument-based automatically constructed thesaurus. To avoid the effects of wrong expansion terms, a weighting method is devised such that the weight of expansion terms depend not only on the weight of all terms in query, but also the weight of those terms in each thesaurus. Experiments show that using heterogeneous thesauri with an appropriate weighting method results in better retrieval performance than using only one type of thesaurus.
Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries | 2009
Dain Kaplan; Ryu Iida; Takenobu Tokunaga
This paper proposes a new method based on coreference-chains for extracting citations from research papers. To evaluate our method we created a corpus of citations comprised of citing papers for 4 cited papers. We analyze some phenomena of citations that are present in our corpus, and then evaluate our method against a cue-phrase-based technique. Our method demonstrates higher precision by 7--10%.
Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages | 2003
Mineki Takechi; Takenobu Tokunaga; Yuji Matsumoto; Hozumi Tanaka
Text categorization, as an essential component of applications for user navigation on the World Wide Web using Question-Answering in Japanese, requires more effective features for the categorization of documents and the efficient acquisition of knowledge. In the questions addressed by such navigation, we focus on those questions for procedures and intend to clarify specification of the answers.
conference of the european chapter of the association for computational linguistics | 1999
Rila Mandala; Takenobu Tokunaga; Hozumi Tanaka
This paper proposes a method to overcome the drawbacks of WordNet when applied to information retrieval by complementing it with Rogets thesaurus and corpus-derived thesauri. Words and relations which are not included in WordNet can be found in the corpus-derived thesauri. Effects of polysemy can be minimized with weighting method considering all query terms and all of the thesauri. Experimental results show that our method enhances information retrieval performance significantly.
natural language generation | 1992
Kentaro Inui; Takenobu Tokunaga; Hozumi Tanaka
To generate good text, many kinds of decisions should be made. Many researchers have spent much time searching for the architecture that would determine a proper order for these decisions. However, even if such an architecture is found, there are still certain kinds of problems that are difficult to consider during the generation process. Those problems can be more easily detected and solved by introducing a revision process after generation. In this paper, we argue the importance of text revision with respect to natural language generation, and propose a computational model of text revision. We also discuss its implementation issues and describe an experimental Japanese text generation system, WeiveR.
meeting of the association for computational linguistics | 2006
Takenobu Tokunaga; Virach Sornlertlamvanich; Thatsanee Charoenporn; Nicoletta Calzolari; Monica Monachini; Claudia Soria; Chu-Ren Huang; Yingju Xia; Hao Yu; Laurent Prévot; Kiyoaki Shirai
As an area of great linguistic and cultural diversity, Asian language resources have received much less attention than their western counterparts. Creating a common standard for Asian language resources that is compatible with an international standard has at least three strong advantages: to increase the competitive edge of Asian countries, to bring Asian countries to closer to their western counterparts, and to bring more cohesion among Asian countries. To achieve this goal, we have launched a two year project to create a common standard for Asian language resources. The project is comprised of four research items, (1) building a description framework of lexical entries, (2) building sample lexicons, (3) building an upper-layer ontology and (4) evaluating the proposed framework through an application. This paper outlines the project in terms of its aim and approach.
Life-like characters | 2004
Hozumi Tanaka; Takenobu Tokunaga; Yusuke Shinyama
This chapter describes a system called Kairai and its Natural Language Understanding (NLU) capabilities. It identifies its strength and shortcomings and identifies requirements for future NLU systems. The NLU research environment has changed drastically in the past two decades. Better technologies in speech recognition, natural language processing, and computer graphics are now available and make it much easier to develop a life-like animated agent (a software robot) which can understand commands in spoken language and perform actions specified by the commands. Combining these technologies, a life-like animated agent system named Kairai was developed at our laboratory to conduct preliminary research on an NLU system. Although Kairai includes many innovative features, several important problems hindering the building of a better NLU system still remain. After describing several issues the Kairai system can handle, we will conclude by outlining what problems we have to solve in the future. The results obtained from our research should be naturally applicable to hardware robots.
Information Processing and Management | 2004
Akira Terada; Takenobu Tokunaga; Hozumi Tanaka
Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation expansion candidates (candidates words for original form of abbreviations) to expand abbreviations. We use a corpus with few abbreviations from the same field instead of a dictionary. We calculate the adequacy of abbreviation expansion candidates based on the similarity between the context of the target abbreviation and that of its expansion candidate. The similarity is calculated using a vector space model in which each vector element consists of words surrounding the target abbreviation and those of its expansion candidate. Experiments using approximately 10,000 documents in the field of aviation showed that the accuracy of the proposed method is 10% higher than that of previously developed methods.