Nobuhiro Kaji | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nobuhiro Kaji is active.

Explore More

Publication

Featured researches published by Nobuhiro Kaji.

meeting of the association for computational linguistics | 2006

Automatic Construction of Polarity-Tagged Corpus from HTML Documents

Nobuhiro Kaji; Masaru Kitsuregawa

This paper proposes a novel method of building polarity-tagged corpus from HTML documents. The characteristics of this method is that it is fully automatic and can be applied to arbitrary HTML documents. The idea behind our method is to utilize certain layout structures and linguistic pattern. By using them, we can automatically extract such sentences that express opinion. In our experiment, the method could construct a corpus consisting of 126,610 sentences.

meeting of the association for computational linguistics | 2002

Verb Paraphrase based on Case Frame Alignment

Nobuhiro Kaji; Daisuke Kawahara; Sadao Kurohashi; Satoshi B. Sato

This paper describes a method of translating a predicate-argument structure of a verb into that of an equivalent verb, which is a core component of the dictionary-based paraphrasing. Our method grasps several usages of a headword and those of the def-heads as a form of their case frames and aligns those case frames, which means the acquisition of word sense disambiguation rules and the detection of the appropriate equivalent and case marker transformation.

international conference on computational linguistics | 2000

Japanese case structure analysis by unsupervised construction of a case frame dictionary

Daisuke Kawahara; Nobuhiro Kaji; Sadao Kurohashi

In Japanese, case structure analysis is very important to handle several troublesome characteristics of Japanese such as scrambling, omission of case components, and disappearance of case markers. However, for lack of a wide-coverage case frame dictionary, it has been difficult to perform case structure analysis accurately. Although several methods to construct a case frame dictionary from analyzed corpora have been proposed, they cannot avoid data sparseness problem. This paper proposes an unsupervised method of constructing a case frame dictionary from an enormous raw corpus by using a robust and accurate parser. It also provides a case structure analysis method based on the constructed dictionary.

asia pacific web conference | 2008

Socio-sense: a system for analysing the societal behavior from long term web archive

Masaru Kitsuregawa; Takayuki Tamura; Masashi Toyoda; Nobuhiro Kaji

We introduce Socio-Sense Web analysis system. The system applies structural and temporal analysis methods to long term Web archive to obtain insight into the real society. We present an overview of the system and core methods followed by excerpts from case studies on consumer behavior analyses.

empirical methods in natural language processing | 2014

Accurate Word Segmentation and POS Tagging for Japanese Microblogs: Corpus Annotation and Joint Modeling with Lexical Normalization

Nobuhiro Kaji; Masaru Kitsuregawa

Microblogs have recently received widespread interest from NLP researchers. However, current tools for Japanese word segmentation and POS tagging still perform poorly on microblog texts. We developed an annotated corpus and proposed a joint model for overcoming this situation. Our annotated corpus of microblog texts enables not only training of accurate statistical models but also quantitative evaluation of their performance. Our joint model with lexical normalization handles the orthographic diversity of microblog texts. We conducted an experiment to demonstrate that the corpus and model substantially contribute to boosting accuracy.

meeting of the association for computational linguistics | 2016

Prediction of Prospective User Engagement with Intelligent Assistants

Shumpei Sano; Nobuhiro Kaji; Manabu Sassano

Intelligent assistants on mobile devices, such as Siri, have recently gained considerable attention as novel applications of dialogue technologies. A tremendous amount of real users of intelligent assistants provide us with an opportunity to explore a novel task of predicting whether users will continually use their intelligent assistants in the future. We developed prediction models of prospective user engagement by using large-scale user logs obtained from a commercial intelligent assistant. Experiments demonstrated that our models can predict prospective user engagement reasonably well, and outperforms a strong baseline that makes prediction based past utterance frequency.

international conference on computational linguistics | 2008

Using Hidden Markov Random Fields to Combine Distributional and Pattern-Based Word Clustering

Nobuhiro Kaji; Masaru Kitsuregawa

Word clustering is a conventional and important NLP task, and the literature has suggested two kinds of approaches to this problem. One is based on the distributional similarity and the other relies on the co-occurrence of two words in lexicosyntactic patterns. Although the two methods have been discussed separately, it is promising to combine them since they are complementary with each other. This paper proposes to integrate them using hidden Markov random fields and demonstrates its effectiveness through experiments.

international joint conference on natural language processing | 2005

Lexical choice via topic adaptation for paraphrasing written language to spoken language

Nobuhiro Kaji; Sadao Kurohashi

Our research aims at developing a system that paraphrases written language text to spoken language style. In such a system, it is important to distinguish between appropriate and inappropriate words in an input text for spoken language. We call this task lexical choice for paraphrasing. In this paper, we describe a method of lexical choice that considers the topic. Basically, our method is based on the word probabilities in written and spoken language corpora. The novelty of our method is topic adaptation. In our framework, the corpora are classified into topic categories, and the probability is estimated using such corpora that have the same topic as input text. The result of evaluation showed the effectiveness of topic adaptation.

conference on computational natural language learning | 2015

Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs

Shonosuke Ishiwatari; Nobuhiro Kaji; Naoki Yoshinaga; Masashi Toyoda; Masaru Kitsuregawa

We propose a method that learns a crosslingual projection of word representations from one language into another. Our method utilizes translatable context pairs as bonus terms of the objective function. In the experiments, our method outperformed existing methods in three language pairs, (English, Spanish), (Japanese, Chinese) and (English, Japanese), without using any additional supervisions.

meeting of the association for computational linguistics | 2017

Chat Detection in an Intelligent Assistant: Combining Task-oriented and Non-task-oriented Spoken Dialogue Systems.

Satoshi Akasaki; Nobuhiro Kaji

Recently emerged intelligent assistants on smartphones and home electronics (e.g., Siri and Alexa) can be seen as novel hybrids of domain-specific task-oriented spoken dialogue systems and open-domain non-task-oriented ones. To realize such hybrid dialogue systems, this paper investigates determining whether or not a user is going to have a chat with the system. To address the lack of benchmark datasets for this task, we construct a new dataset consisting of 15; 160 utterances collected from the real log data of a commercial intelligent assistant (and will release the dataset to facilitate future research activity). In addition, we investigate using tweets and Web search queries for handling open-domain user utterances, which characterize the task of chat detection. Experiments demonstrated that, while simple supervised methods are effective, the use of the tweets and search queries further improves the F1-score from 86.21 to 87.53.

Explore More