Tomoya Iwakura | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tomoya Iwakura is active.

Explore More

Publication

Featured researches published by Tomoya Iwakura.

meeting of the association for computational linguistics | 2003

Text Simplification for Reading Assistance: A Project Note

Kentaro Inui; Atsushi Fujita; Tetsuro Takahashi; Ryu Iida; Tomoya Iwakura

This paper describes our ongoing research project on text simplification for congenitally deaf people. Text simplification we are aiming at is the task of offering a deaf reader a syntactic and lexical paraphrase of a given text for assisting her/him to understand what it means. In this paper, we discuss the issues we should address to realize text simplification and report on the present results in three different aspects of this task: readability assessment, paraphrase representation and post-transfer error detection.

conference on computational natural language learning | 2008

A Fast Boosting-based Learner for Feature-Rich Tagging and Chunking

Tomoya Iwakura; Seishi Okamoto

Combination of features contributes to a significant improvement in accuracy on tasks such as part-of-speech (POS) tagging and text chunking, compared with using atomic features. However, selecting combination of features on learning with large-scale and feature-rich training data requires long training time. We propose a fast boosting-based algorithm for learning rules represented by combination of features. Our algorithm constructs a set of rules by repeating the process to select several rules from a small proportion of candidate rules. The candidate rules are generated from a subset of all the features with a technique similar to beam search. Then we propose POS tagging and text chunking based on our learning algorithm. Our tagger and chunker use candidate POS tags or chunk tags of each word collected from automatically tagged data. We evaluate our methods with English POS tagging and text chunking. The experimental results show that the training time of our algorithm are about 50 times faster than Support Vector Machines with polynomial kernel on the average while maintaining state-of-the-art accuracy and faster classification speed.

meeting of the association for computational linguistics | 2016

Comparison of Annotating Methods for Named Entity Corpora.

Kanako Komiya; Masaya Suzuki; Tomoya Iwakura; Minoru Sasaki; Hiroyuki Shinnou

We compared two methods to annotate a corpus via non-expert annotators for named entity (NE) recognition task, which are (1) revising the results of the existing NE recognizer and (2) annotating NEs only by hand. We investigated the annotation time, the degrees of agreement, and the performances based on the gold standard. As we have two annotators for one file of each method, we evaluated the two performances, which are the averaged performances over the two annotators and the performances deeming the annotations correct when either of them is correct. The experiments revealed that the semi-automatic annotation was faster and showed better agreements and higher performances on average. However they also indicated that sometimes fully manual annotation should be used for some texts whose genres are far from its training data. In addition, the experiments using the annotated corpora via semi-automatic and fully manual annotation as training data for machine learning indicated that the F-measures sometimes could be better for some texts when we used manual annotation than when we used semi-automatic annotation.

Proceedings of the Sixth Named Entity Workshop | 2016

Constructing a Japanese Basic Named Entity Corpus of Various Genres

Tomoya Iwakura; Kanako Komiya; Ryuichi Tachibana

This paper introduces a Japanese Named Entity (NE) corpus of various genres. We annotated 136 documents in the Balanced Corpus of Contemporary Written Japanese (BCCWJ) with the eight types of NE tags defined by Information Retrieval and Extraction Exercise. The NE corpus consists of six types of genres of documents such as blogs, magazines, white papers, and so on, and the corpus contains 2,464 NE tags in total. The corpus can be reproduced with BCCWJ corpus and the tagging information obtained from https://sites.google.com/ site/projectnextnlpne/en/ .

pacific rim international conference on artificial intelligence | 2016

Fast training of a graph boosting for large-scale text classification

Hiyori Yoshikawa; Tomoya Iwakura

This paper proposes a fast training method for graph classification based on a boosting algorithm and its application to sentimental analysis with input texts represented by graphs. Graph format is very suitable for representing texts structured with Natural Language Processing techniques such as morphological analysis, Named Entity Recognition, and parsing. A number of classification methods which represent texts as graphs have been proposed so far. However, many of them limit candidate features in advance because of quite large size of feature space. Instead of limiting search space in advance, we propose two approximation methods for learning of graph-based rules in a boosting. Experimental results on a sentimental analysis dataset show that our method contributes to improved training speed. In addition, the graph representation-based classification method exploits rich structural information of texts, which is impossible to be detected when using other simpler input formats, and shows higher accuracy.

pacific rim international conference on artificial intelligence | 2014

An AdaBoost for Efficient Use of Confidences of Weak Hypotheses on Text Categorization

Tomoya Iwakura; Takahiro Saitou; Seishi Okamoto

We propose a boosting algorithm based on AdaBoost for using real-valued weak hypotheses that return confidences of their classifications as real numbers with an approximated upper bound of the training error. The approximated upper bound is induced with Bernoulli’s inequality and the upper bound enables us to analytically calculate a confidence-value that satisfies a reduction in the original upper bound. The experimental results on the Reuters-21578 data set and an Amazon review data show that our boosting algorithm with the perceptron attains better accuracy than Support Vector Machines, decision stumps-based boosting algorithms and a perceptron.

ACM Transactions on Asian Language Information Processing | 2013

A Named Entity Recognition Method Based on Decomposition and Concatenation of Word Chunks

Tomoya Iwakura; Hiroya Takamura; Manabu Okumura

We propose a named entity (NE) recognition method in which word chunks are repeatedly decomposed and concatenated. Our method identifies word chunks with a base chunker, such as a noun phrase chunker, and then recognizes NEs from the recognized word chunk sequences. By using word chunks, we can obtain features that cannot be obtained in word-sequence-based recognition methods, such as the first word of a word chunk, the last word of a word chunk, and so on. However, each word chunk may include a part of an NE or multiple NEs. To solve this problem, we use the following operators: SHIFT for separating the first word from a word chunk, POP for separating the last word from a word chunk, JOIN for concatenating two word chunks, and REDUCE for assigning an NE label to a word chunk. We evaluate our method on a Japanese NE recognition dataset that includes about 200,000 annotations of 191 types of NEs from over 8,500 news articles. The experimental results show that the training and processing speeds of our method are faster than those of a linear-chain structured perceptron and a semi-Markov perceptron, while maintaining high accuracy.

international conference on computational linguistics | 2010

A named entity extraction using word information repeatedly collected from unlabeled data

Tomoya Iwakura

This paper proposes a method for Named Entity (NE) extraction using NE-related labels of words repeatedly collected from unlabeled data. NE-related labels of words are candidate NE classes of each word, NE classes of co-occurring words of each word, and so on. To collect NE-related labels of words, we extract NEs from unlabeled data with an NE extractor. Then we collect NE-related labels of words from the extraction results. We create a new NE extractor using the NE-related labels of each word as new features. The new NE extractor is used to collect new NE-related labels of words. The experimental results using IREX data set for Japanese NE extraction show that our method contributes improved accuracy.

acm transactions on asian and low-resource language information processing | 2018

Comparison of Methods to Annotate Named Entity Corpora

Kanako Komiya; Masaya Suzuki; Tomoya Iwakura; Minoru Sasaki; Hiroyuki Shinnou

The authors compared two methods for annotating a corpus for the named entity (NE) recognition task using non-expert annotators: (i) revising the results of an existing NE recognizer and (ii) manually annotating the NEs completely. The annotation time, degree of agreement, and performance were evaluated based on the gold standard. Because there were two annotators for one text for each method, two performances were evaluated: the average performance of both annotators and the performance when at least one annotator is correct. The experiments reveal that semi-automatic annotation is faster, achieves better agreement, and performs better on average. However, they also indicate that sometimes, fully manual annotation should be used for some texts whose document types are substantially different from the training data document types. In addition, the machine learning experiments using semi-automatic and fully manually annotated corpora as training data indicate that the F-measures could be better for some texts when manual instead of semi-automatic annotation was used. Finally, experiments using the annotated corpora for training as additional corpora show that (i) the NE recognition performance does not always correspond to the performance of the NE tag annotation and (ii) the system trained with the manually annotated corpus outperforms the system trained with the semi-automatically annotated corpus with respect to newswires, even though the existing NE recognizer was mainly trained with newswires.

recent advances in natural language processing | 2017

An Eye-tracking Study of Named Entity Annotation.

Takenobu Tokunaga; Hitoshi Nishikawa; Tomoya Iwakura

Utilising effective features in machine learning-based natural language processing (NLP) is crucial in achieving good performance for a given NLP task. The paper describes a pilot study on the analysis of eye-tracking data during named entity (NE) annotation, aiming at obtaining insights into effective features for the NE recognition task. The eye gaze data were collected from 10 annotators and analysed regarding working time and fixation distribution. The results of the preliminary qualitative analysis showed that human annotators tend to look at broader contexts around the target NE than recent state-of-the-art automatic NE recognition systems and to use predicate argument relations to identify the NE categories.

Explore More