Mamoru Komachi
Nara Institute of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mamoru Komachi.
linguistic annotation workshop | 2007
Ryu Iida; Mamoru Komachi; Kentaro Inui; Yuji Matsumoto
In this paper, we discuss how to annotate coreference and predicate-argument relations in Japanese written text. There have been research activities for building Japanese text corpora annotated with coreference and predicate-argument relations as are done in the Kyoto Text Corpus version 4.0 (Kawahara et al., 2002) and the GDA-Tagged Corpus (Hasida, 2005). However, there is still much room for refining their specifications. For this reason, we discuss issues in annotating these two types of relations, and propose a new specification for each. In accordance with the specification, we built a large-scaled annotated corpus, and examined its reliability. As a result of our current work, we have released an annotated corpus named the NAIST Text Corpus1, which is used as the evaluation data set in the coreference and zero-anaphora resolution tasks in Iida et al. (2005) and Iida et al. (2006).
empirical methods in natural language processing | 2008
Mamoru Komachi; Taku Kudo; Masashi Shimbo; Yuji Matsumoto
Bootstrapping has a tendency, called semantic drift, to select instances unrelated to the seed instances as the iteration proceeds. We demonstrate the semantic drift of bootstrapping has the same root as the topic drift of Kleinbergs HITS, using a simplified graph-based reformulation of bootstrapping. We confirm that two graph-based algorithms, the von Neumann kernels and the regularized Laplacian, can reduce semantic drift in the task of word sense disambiguation (WSD) on Senseval-3 English Lexical Sample Task. Proposed algorithms achieve superior performance to Espresso and previous graph-based WSD methods, even though the proposed algorithms have less parameters and are easy to calibrate.
international joint conference on natural language processing | 2015
Shin Kanouchi; Mamoru Komachi; Naoaki Okazaki; Eiji Aramaki; Hiroshi Ishikawa
The development and proliferation of social media services has led to the emergence of new approaches for surveying the population and addressing social issues. One popular application of social media data is health surveillance, e.g., predicting the outbreak of an epidemic by recognizing diseases and symptoms from text messages posted on social media platforms. In this paper, we propose a novel task that is crucial and generic from the viewpoint of health surveillance: estimating a subject (carrier) of a disease or symptommentioned in a Japanese tweet. By designing an annotation guideline for labeling the subject of a disease/symptom in a tweet, we perform annotations on an existing corpus for public surveillance. In addition, we present a supervised approach for predicting the subject of a disease/symptom. The results of our experiments demonstrate the impact of subject identification on the effective detection of an episode of a disease/symptom. Moreover, the results suggest that our task is independent of the type of disease/symptom.
meeting of the association for computational linguistics | 2009
Mamoru Komachi; Shimpei Makimoto; Kei Uchiumi; Manabu Sassano
As the web grows larger, knowledge acquisition from the web has gained increasing attention. In this paper, we propose using web search clickthrough logs to learn semantic categories. Experimental results show that the proposed method greatly outperforms previous work using only web search query logs.
meeting of the association for computational linguistics | 2016
Tomonori Kodaira; Tomoyuki Kajiwara; Mamoru Komachi
We propose a new dataset for evaluating a Japanese lexical simplification method. Previous datasets have several deficiencies. All of them substitute only a single target word, and some of them extract sentences only from newswire corpus. In addition, most of these datasets do not allow ties and integrate simplification ranking from all the annotators without considering the quality. In contrast, our dataset has the following advantages: (1) it is the first controlled and balanced dataset for Japanese lexical simplification with high correlation with human judgment and (2) the consistency of the simplification ranking is improved by allowing candidates to have ties and by considering the reliability of annotators.
meeting of the association for computational linguistics | 2017
Yui Suzuki; Tomoyuki Kajiwara; Mamoru Komachi
We propose a novel sentential paraphrase acquisition method. To build a wellbalanced corpus for Paraphrase Identification, we especially focus on acquiring both non-trivial positive and negative instances. We use multiple machine translation systems to generate positive candidates and a monolingual corpus to extract negative candidates. To collect nontrivial instances, the candidates are uniformly sampled by word overlap rate. Finally, annotators judge whether the candidates are either positive or negative. Using this method, we built and released the first evaluation corpus for Japanese paraphrase identification, which comprises 655 sentence pairs.
meeting of the association for computational linguistics | 2015
Yinchen Zhao; Mamoru Komachi; Hiroshi Ishikawa
In this study, we describe our system submitted to the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (NLP-TEA-2) shared task on Chinese grammatical error diagnosis (CGED). We use a statistical machine translation method already applied to several similar tasks (Brockett et al., 2006; Chiu et al., 2013; Zhao et al., 2014). In this research, we examine corpus-augmentation and explore alternative translation models including syntaxbased and hierarchical phrase-based models. Finally, we show variations using different combinations of these factors.
meeting of the association for computational linguistics | 2015
Yoshiaki Kitagawa; Mamoru Komachi; Eiji Aramaki; Naoaki Okazaki; Hiroshi Ishikawa
Social media has attracted attention because of its potential for extraction of information of various types. For example, information collected from Twitter enables us to build useful applications such as predicting an epidemic of influenza. However, using text information from social media poses challenges for event detection because of the unreliable nature of user-generated texts, which often include counter-factual statements. Consequently, this study proposes the use of modality features to improve disease event detection from Twitter messages, or “tweets”. Experimental results demonstrate that the combination of a modality dictionary and a modality analyzer improves the F1-score by 3.5 points.
Handbook of Linguistic Annotation | 2017
Ryu Iida; Mamoru Komachi; Naoya Inoue; Kentaro Inui; Yuji Matsumoto
This chapter discusses how we decided the annotation schemes for predicate-argument and coreference relations in Japanese texts. Japanese is characterised by an extensive use of zero anaphors, which behave like pronouns in English. Furthermore, due to its lack of explicit definite articles (i.e. ‘the’ in English), manually identifying coreference relations is difficult compared to English. We designed our annotation specifications with this in mind, and then built a large scale annotated corpus, which was released as the NAIST Text Corpus. In this chapter, we also present the details of the NAIST Text Corpus by comparing it to other similar corpora such as the Kyoto University Text Corpus (version 4.0) [14] and the Global document annotation (GDA)-tagged Corpus [7].
international joint conference on natural language processing | 2011
Tomoya Mizumoto; Mamoru Komachi; Masaaki Nagata; Yuji Matsumoto