Is this you? Create Your Porfile

Masayuki Asahara

Nara Institute of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Masayuki Asahara is active.

Explore More

Publication

Featured researches published by Masayuki Asahara.

north american chapter of the association for computational linguistics | 2003

Japanese Named Entity extraction with redundant morphological analysis

Masayuki Asahara; Yuji Matsumoto

Named Entity (NE) extraction is an important subtask of document processing such as information extraction and question answering. A typical method used for NE extraction of Japanese texts is a cascade of morphological analysis, POS tagging and chunking. However, there are some cases where segmentation granularity contradicts the results of morphological analysis and the building units of NEs, so that extraction of some NEs are inherently impossible in this setting. To cope with the unit problem, we propose a character-based chunking method. Firstly, the input sentence is analyzed redundantly by a statistical morphological analyzer to produce multiple (n-best) answers. Then, each character is annotated with its character types and its possible POS tags of the top n-best answers. Finally, a support vector machine-based chunker picks up some portions of the input sentence as NEs. This method introduces richer information to the chunker than previous methods that base on a single morphological analysis result. We apply our method to IREX NE extraction task. The cross validation result of the F-measure being 87.2 shows the superiority and effectiveness of the method.

international conference on computational linguistics | 2000

Extended models and tools for high-performance part-of-speech tagger

Masayuki Asahara; Yuji Matsumoto

Statistical part-of-speech (POS) taggers achieve high accuracy and robustness when based on large scale manually tagged corpora. However, enhancements of the learning models are necessary to achieve better performance. We are developing a learning tool for a Japanese morphological analyzer called ChaSen. Currently we use a fine-grained POS tag set with about 500 tags. To apply a normal tri gram model on the tag set, we need unrealistic size of corpora. Even, for a bi-gram model, we cannot prepare a moderate size of an annotated corpus, when we take all the tags as distinct. A usual technique to cope with such fine-grained tags is to reduce the size of the tag set by grouping the set of tags into equivalence classes. We introduce the concept of position-wise grouping where the tag set is partitioned into different equivalence classes at each position in the conditional probabilities in the Markov Model. Moreover, to cope with the data sparseness problem caused by exceptional phenomena, we introduce several other techniques such as word-level statistics, smoothing of word-level and POS-level statistics and a selective tri-gram model. To help users determine probabilistic parameters, we introduce an error-driven method for the parameter selection. We then give results of experiments to see the effect of the tools applied to an existing Japanese morphological analyzer.

international joint conference on natural language processing | 2009

Jointly Identifying Temporal Relations with Markov Logic

Katsumasa Yoshikawa; Sebastian Riedel; Masayuki Asahara; Yuji Matsumoto

Recent work on temporal relation identification has focused on three types of relations between events: temporal relations between an event and a time expression, between a pair of events and between an event and the document creation time. These types of relations have mostly been identified in isolation by event pairwise comparison. However, this approach neglects logical constraints between temporal relations of different types that we believe to be helpful. We therefore propose a Markov Logic model that jointly identifies relations of all three relation types simultaneously. By evaluating our model on the TempEval data we show that this approach leads to about 2% higher accuracy for all three types of relations ---and to the best results for the task when compared to those of other machine learning based systems.

meeting of the association for computational linguistics | 2003

Chinese Unknown Word Identification Using Character-based Tagging and Chunking

Chooi Ling Goh; Masayuki Asahara; Yuji Matsumoto

Since written Chinese has no space to delimit words, segmenting Chinese texts becomes an essential task. During this task, the problem of unknown word occurs. It is impossible to register all words in a dictionary as new words can always be created by combining characters. We propose a unified solution to detect unknown words in Chinese texts. First, a morphological analysis is done to obtain initial segmentation and POS tags and then a chunker is used to detect unknown words.

Journal of Biomedical Semantics | 2011

Coreference based event-argument relation extraction on biomedical text

Katsumasa Yoshikawa; Sebastian Riedel; Tsutomu Hirao; Masayuki Asahara; Yuji Matsumoto

This paper presents a new approach to exploit coreference information for extracting event-argument (E-A) relations from biomedical documents. This approach has two advantages: (1) it can extract a large number of valuable E-A relations based on the concept of salience in discourse; (2) it enables us to identify E-A relations over sentence boundaries (cross-links) using transitivity of coreference relations. We propose two coreference-based models: a pipeline based on Support Vector Machine (SVM) classifiers, and a joint Markov Logic Network (MLN). We show the effectiveness of these models on a biomedical event corpus. Both models outperform the systems that do not use coreference information. When the two proposed models are compared to each other, joint MLN outperforms pipeline SVM with gold coreference information.

meeting of the association for computational linguistics | 2007

NAIST.Japan: Temporal Relation Identification Using Dependency Parsed Tree

Yuchang Cheng; Masayuki Asahara; Yuji Matsumoto

In this paper, we attempt to use a sequence labeling model with features from dependency parsed tree for temporal relation identification. In the sequence labeling model, the relations of contextual pairs can be used as features for relation identification of the current pair. Head-modifier relations between pairs of words within one sentence can be also used as the features. In our preliminary experiments, these features are effective for the temporal relation identification tasks.

international conference on computational linguistics | 2004

Japanese unknown word identification by character-based chunking

Masayuki Asahara; Yuji Matsumoto

We introduce a character-based chunking for unknown word identification in Japanese text. A major advantage of our method is an ability to detect low frequency unknown words of unrestricted character type patterns. The method is built upon SVM-based chunking, by use of character n-gram and surrounding context of n-best word segmentation candidates from statistical morphological analysis as features. It is applied to newspapers and patent texts, achieving 95% precision and 55-70% recall for newspapers and more than 85% precision for patent texts.

Proceedings of the Second SIGHAN Workshop on Chinese Language Processing | 2003

Combining Segmenter and Chunker for Chinese Word Segmentation

Masayuki Asahara; Chooi Ling Goh; Xiaojie Wang; Yuji Matsumoto

Our proposed method is to use a Hidden Markov Model-based word segmenter and a Support Vector Machine-based chunker for Chinese word segmentation. Firstly, input sentences are analyzed by the Hidden Markov Model-based word segmenter. The word segmenter produces n-best word candidates together with some class information and confidence measures. Secondly, the extracted words are broken into character units and each character is annotated with the possible word class and the position in the word, which are then used as the features for the chunker. Finally, the Support Vector Machine-based chunker brings character units together into words so as to determine the word boundaries.

international conference on computational linguistics | 2008

Japanese Dependency Parsing Using a Tournament Model

Masakazu Iwatate; Masayuki Asahara; Yuji Matsumoto

In Japanese dependency parsing, Kudos relative preference-based method (Kudo and Matsumoto, 2005) outperforms both deterministic and probabilistic CKY-based parsing methods. In Kudos method, for each dependent word (or chunk) a log-linear model estimates relative preference of all other candidate words (or chunks) for being as its head. This cannot be considered in the deterministic parsing methods. We propose an algorithm based on a tournament model, in which the relative preferences are directly modeled by one-on-one games in a step-ladder tournament. In an evaluation experiment with Kyoto Text Corpus Version 4.0, the proposed method outperforms previous approaches, including the relative preference-based method.

conference on computational natural language learning | 2006

Multi-lingual Dependency Parsing at NAIST

Yuchang Cheng; Masayuki Asahara; Yuji Matsumoto

In this paper, we present a framework for multi-lingual dependency parsing. Our bottom-up deterministic parser adopts Nivres algorithm (Nivre, 2004) with a preprocessor. Support Vector Machines (SVMs) are utilized to determine the word dependency attachments. Then, a maximum entropy method (MaxEnt) is used for determining the label of the dependency relation. To improve the performance of the parser, we construct a tagger based on SVMs to find neighboring attachment as a preprocessor. Experimental evaluation shows that the proposed extension improves the parsing accuracy of our base parser in 9 languages. (Hajic et al., 2004; Simov et al., 2005; Simov and Osenova, 2003; Chen et al., 2003; Bohmova et al., 2003; Kromann, 2003; van der Beek et al., 2002; Brants et al., 2002; Kawata and Bartels, 2000; Afonso et al., 2002; Dzeroski et al., 2006; Civit and Marti, 2002; Nilsson et al., 2005; Oflazer et al., 2003; Atalay et al., 2003).

Explore More