Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tadashi Nomoto is active.

Publication


Featured researches published by Tadashi Nomoto.


international acm sigir conference on research and development in information retrieval | 2001

A new approach to unsupervised text summarization

Tadashi Nomoto; Yuji Matsumoto

The paper presents a novel approach to unsupervised text summarization. The novelty lies in exploiting the diversity of concepts in text for summarization, which has not received much attention in the summarization literature. A diversity-based approach here is a principled generalization of Maximal Marginal Relevance criterion by Carbonell and Goldstein \cite{carbonell-goldstein98}. We propose, in addition, aninformation-centricapproach to evaluation, where the quality of summaries is judged not in terms of how well they match human-created summaries but in terms of how well they represent their source documents in IR tasks such document retrieval and text categorization. To find the effectiveness of our approach under the proposed evaluation scheme, we set out to examine how a system with the diversity functionality performs against one without, using the BMIR-J2 corpus, a test data developed by a Japanese research consortium. The results demonstrate a clear superiority of a diversity based approach to a non-diversity based approach.


meeting of the association for computational linguistics | 2004

Multi-Engine Machine Translation with Voted Language Model

Tadashi Nomoto

The paper describes a particular approach to multiengine machine translation (MEMT), where we make use of voted language models to selectively combine translation outputs from multiple off-the-shelf MT systems. Experiments are done using large corpora from three distinct domains. The study found that the use of voted language models leads to an improved performance of MEMT systems.


Information Processing and Management | 2007

Discriminative sentence compression with conditional random fields

Tadashi Nomoto

The paper focuses on a particular approach to automatic sentence compression which makes use of a discriminative sequence classifier known as Conditional Random Fields (CRF). We devise several features for CRF that allow it to incorporate information on nonlinear relations among words. Along with that, we address the issue of data paucity by collecting data from RSS feeds available on the Internet, and turning them into training data for use with CRF, drawing on techniques from biology and information retrieval. We also discuss a recursive application of CRF on the syntactic structure of a sentence as a way of improving the readability of the compression it generates. Experiments found that our approach works reasonably well compared to the state-of-the-art system [Knight, K., & Marcu, D. (2002). Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence 139, 91-107.].


empirical methods in natural language processing | 2009

A Comparison of Model Free versus Model Intensive Approaches to Sentence Compression

Tadashi Nomoto

This work introduces a model free approach to sentence compression, which grew out of ideas from Nomoto (2008), and examines how it compares to a state-of-art model intensive approach known as Tree-to-Tree Transducer, or T3 (Cohn and Lapata, 2008). It is found that a model free approach significantly outperforms T3 on the particular data we created from the Internet. We also discuss what might have caused T3s poor performance.


Information Processing and Management | 2003

The diversity-based approach to open-domain text summarization

Tadashi Nomoto; Yuji Matsumoto

The paper introduces a novel approach to unsupervised text summarization, which in principle should work for any domain or genre. The novelty lies in exploiting the diversity of concepts in text for summarization, which has not received much attention in the summarization literature. We propose, in addition, what we call the information-centric approach to evaluation, where the quality of summaries is judged not in terms of how well they match human-created summaries but in terms of how well they represent their source documents in IR tasks such document retrieval and text categorization. To find the effectiveness of our approach under the proposed evaluation scheme, we set out to examine how a system with the diversity functionality performs against one without, using the test data known as BMIR-J2. The results demonstrate a clear superiority of the diversity-based approach to a non-diversity-based approach.The paper also addresses the question of how closely the diversity approach models human judgments on summarization. We have created a relatively large volume of data annotated for relevance to summarization by human subjects. We have trained a decision tree-based summarizer using the data, and examined how the diversity method compares with the supervised method in performance when tested on the data. It was found that the diversity approach performs as well as and in some cases superior to the supervised method.


international conference on data mining | 2001

An experimental comparison of supervised and unsupervised approaches to text summarization

Tadashi Nomoto; Yuji Matsumoto

The paper presents a direct comparison of supervised and unsupervised approaches to text summarization. As a representative supervised method, we use the C4.5 decision tree algorithm, extended with the minimum description length principle (MDL), and compare it against several unsupervised methods. It is found that a particular unsupervised method based on an extension of the K-means clustering algorithm, performs equal to and in some cases superior to the decision tree based method.


conference on information and knowledge management | 2011

WikiLabel: an encyclopedic approach to labeling documents en masse

Tadashi Nomoto

This paper presents a particular approach to collective labeling of multiple documents, which works by associating the documents with Wikipedia pages and labeling them with headings the pages carry. The approach has an obvious advantage over past approaches in that it is able to produce fluent labels, as they are hand-written by human editors. We carried out some experiments on the TDT5 dataset, which found that the approach works rather robustly for an arbitrary set of documents in the news domain. Comparisons were made with some baselines, including the state of the art, with results strongly in favor of our approach.


empirical methods in natural language processing | 2005

Bayesian Learning in Text Summarization

Tadashi Nomoto

The paper presents a Bayesian model for text summarization, which explicitly encodes and exploits information on how human judgments are distributed over the text. Comparison is made against non Bayesian summarizers, using test data from Japanese news texts. It is found that the Bayesian approach generally leverages performance of a summarizer, at times giving it a significant lead over non-Bayesian models.


exploiting semantic annotations in information retrieval | 2012

Conceptualizing documents with Wikipedia

Tadashi Nomoto; Noriko Kando

In this work, we will discuss how to improve Wikilabel, an approach which makes use of titles in Wikipedia pages to generate labels for documents, by retooling ideas from story link detection (SLD). A comparison of our approach against Elastic Net, a powerful machine learner, on the real world data, finds the visible superiority of our approach over the latter.


meeting of the association for computational linguistics | 2002

Supervised Ranking in Open-Domain Text Summarization

Tadashi Nomoto; Yuji Matsumoto

The paper proposes and empirically motivates an integration of supervised learning with unsupervised learning to deal with human biases in summarization. In particular, we explore the use of probabilistic decision tree within the clustering framework to account for the variation as well as regularity in human created summaries. The corpus of human created extracts is created from a newspaper corpus and used as a test set. We build probabilistic decision trees of different flavors and integrate each of them with the clustering framework. Experiments with the corpus demonstrate that the mixture of the two paradigms generally gives a significant boost in performance compared to cases where either of the two is considered alone.

Collaboration


Dive into the Tadashi Nomoto's collaboration.

Top Co-Authors

Avatar

Yuji Matsumoto

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Noriko Kando

National Institute of Informatics

View shared research outputs
Researchain Logo
Decentralizing Knowledge