Tsutomu Hirao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tsutomu Hirao is active.

Explore More

Publication

Featured researches published by Tsutomu Hirao.

international conference on computational linguistics | 2002

Extracting important sentences with support vector machines

Tsutomu Hirao; Hideki Isozaki; Eisaku Maeda; Yuji Matsumoto

Extracting sentences that contain important information from a document is a form of text summarization. The technique is the key to the automatic generation of summaries similar to those written by humans. To achieve such extraction, it is important to be able to integrate heterogeneous pieces of information. One approach, parameter tuning by machine learning, has been attracting a lot of attention. This paper proposes a method of sentence extraction based on Support Vector Machines (SVMs). To confirm the methods performance, we conduct experiments that compare our method to three existing methods. Results on the Text Summarization Challenge (TSC) corpus show that our method offers the highest accuracy. Moreover, we clarify the different features effective for extracting different document genres.

Journal of Biomedical Semantics | 2011

Coreference based event-argument relation extraction on biomedical text

Katsumasa Yoshikawa; Sebastian Riedel; Tsutomu Hirao; Masayuki Asahara; Yuji Matsumoto

This paper presents a new approach to exploit coreference information for extracting event-argument (E-A) relations from biomedical documents. This approach has two advantages: (1) it can extract a large number of valuable E-A relations based on the concept of salience in discourse; (2) it enables us to identify E-A relations over sentence boundaries (cross-links) using transitivity of coreference relations. We propose two coreference-based models: a pipeline based on Support Vector Machine (SVM) classifiers, and a joint Markov Logic Network (MLN). We show the effectiveness of these models on a biomedical event corpus. Both models outperform the systems that do not use coreference information. When the two proposed models are compared to each other, joint MLN outperforms pipeline SVM with gold coreference information.

empirical methods in natural language processing | 2003

Japanese zero pronoun resolution based on ranking rules and machine learning

Hideki Isozaki; Tsutomu Hirao

Anaphora resolution is one of the most important research topics in Natural Language Processing. In English, overt pronouns such as she and definite noun phrases such as the company are anaphors that refer to preceding entities (antecedents). In Japanese, anaphors are often omitted, and these omissions are called zero pronouns. There are two major approaches to zero pronoun resolution: the heuristic approach and the machine learning approach. Since we have to take various factors into consideration, it is difficult to find a good combination of heuristic rules. Therefore, the machine learning approach is attractive, but it requires a large amount of training data. In this paper, we propose a method that combines ranking rules and machine learning. The ranking rules are simple and effective, while machine learning can take more factors into account. From the results of our experiments, this combination gives better performance than either of the two previous approaches.

empirical methods in natural language processing | 2014

Dependency-based Discourse Parser for Single-Document Summarization

Yasuhisa Yoshida; Jun Suzuki; Tsutomu Hirao; Masaaki Nagata

The current state-of-the-art singledocument summarization method generates a summary by solving a Tree Knapsack Problem (TKP), which is the problem of finding the optimal rooted subtree of the dependency-based discourse tree (DEP-DT) of a document. We can obtain a gold DEP-DT by transforming a gold Rhetorical Structure Theory-based discourse tree (RST-DT). However, there is still a large difference between the ROUGE scores of a system with a gold DEP-DT and a system with a DEP-DT obtained from an automatically parsed RST-DT. To improve the ROUGE score, we propose a novel discourse parser that directly generates the DEP-DT. The evaluation results showed that the TKP with our parser outperformed that with the state-of-the-art RST-DT parser, and achieved almost equivalent ROUGE scores to the TKP with the gold DEP-DT.

empirical methods in natural language processing | 2016

Neural Headline Generation on Abstract Meaning Representation.

Sho Takase; Jun Suzuki; Naoaki Okazaki; Tsutomu Hirao; Masaaki Nagata

Neural network-based encoder-decoder models are among recent attractive methodologies for tackling natural language generation tasks. This paper investigates the usefulness of structural syntactic and semantic information additionally incorporated in a baseline neural attention-based model. We encode results obtained from an abstract meaning representation (AMR) parser using a modified version of Tree-LSTM. Our proposed attention-based AMR encoder-decoder model improves headline generation benchmarks compared with the baseline neural attention-based model.

international conference on computational linguistics | 2004

Corpus and evaluation measures for multiple document summarization with multiple sources

Tsutomu Hirao; Takahiro Fukusima; Manabu Okumura; Chikashi Nobata; Hidetsugu Nanba

In this paper, we introduce a large-scale test collection for multiple document summarization, the Text Summarization Challenge 3 (TSC3) corpus. We detail the corpus construction and evaluation measures. The significant feature of the corpus is that it annotates not only the important sentences in a document set, but also those among them that have the same content. Moreover, we define new evaluation metrics taking redundancy into account and discuss the effectiveness of redundancy minimization.

meeting of the association for computational linguistics | 2014

Single Document Summarization based on Nested Tree Structure

Yuta Kikuchi; Tsutomu Hirao; Hiroya Takamura; Manabu Okumura; Masaaki Nagata

Many methods of text summarization combining sentence selection and sentence compression have recently been proposed. Although the dependency between words has been used in most of these methods, the dependency between sentences, i.e., rhetorical structures, has not been exploited in such joint methods. We used both dependency between words and dependency between sentences by constructing a nested tree, in which nodes in the document tree representing dependency between sentences were replaced by a sentence tree representing dependency between words. We formulated a summarization task as a combinatorial optimization problem, in which the nested tree was trimmed without losing important content in the source document. The results from an empirical evaluation revealed that our method based on the trimming of the nested tree significantly improved the summarization of texts.

international acm sigir conference on research and development in information retrieval | 2004

Text Summarization Challenge 2 text summarization evaluation at NTCIR workshop 3

Manabu Okumura; Takahiro Fukusima; Hidetsugu Nanba; Tsutomu Hirao

We report the outline of Text Summarization Challenge 2 (TSC2 hereafter), a sequel text summarization evaluation conducted as one of the tasks at the NTCIR Workshop 3. First, we describe briefly the previous evaluation, Text Summarization Challenge (TSC1) as introduction to TSC2. Then we explain TSC2 including the participants, the two tasks in TSC2, data used, evaluation methods for each task, and brief report on the results. Lastly we describe plans for the next evaluation, TSC3.

international joint conference on natural language processing | 2009

A Syntax-Free Approach to Japanese Sentence Compression

Tsutomu Hirao; Jun Suzuki; Hideki Isozaki

Conventional sentence compression methods employ a syntactic parser to compress a sentence without changing its meaning. However, the reference compressions made by humans do not always retain the syntactic structures of the original sentences. Moreover, for the goal of on-demand sentence compression, the time spent in the parsing stage is not negligible. As an alternative to syntactic parsing, we propose a novel term weighting technique based on the positional information within the original sentence and a novel language model that combines statistics from the original sentence and a general corpus. Experiments that involve both human subjective evaluations and automatic evaluations show that our method outperforms Horis method, a state-of-the-art conventional technique. Because our method does not use a syntactic parser, it is 4.3 times faster than Horis method.

empirical methods in natural language processing | 2005

Kernel-based Approach for Automatic Evaluation of Natural Language Generation Technologies: Application to Automatic Summarization

Tsutomu Hirao; Manabu Okumura; Hideki Isozaki

In order to promote the study of automatic summarization and translation, we need an accurate automatic evaluation method that is close to human evaluation. In this paper, we present an evaluation method that is based on convolution kernels that measure the similarities between texts considering their substructures. We conducted an experiment using automatic summarization evaluation data developed for Text Summarization Challenge 3 (TSC-3). A comparison with conventional techniques shows that our method correlates more closely with human evaluations and is more robust.

Explore More