Hideki Isozaki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hideki Isozaki is active.

Explore More

Publication

Featured researches published by Hideki Isozaki.

international conference on computational linguistics | 2002

Efficient support vector classifiers for named entity recognition

Hideki Isozaki; Hideto Kazawa

Named Entity (NE) recognition is a task in which proper nouns and numerical information are extracted from documents and are classified into categories such as person, organization, and date. It is a key technology of Information Extraction and Open-Domain Question Answering. First, we show that an NE recognizer based on Support Vector Machines (SVMs) gives better scores than conventional systems. However, off-the-shelf SVM classifiers are too inefficient for this task. Therefore, we present a method that makes the system substantially faster. This approach can also be applied to other similar tasks such as chunking and part-of-speech tagging. We also present an SVM-based feature selection method and an efficient training method.

empirical methods in natural language processing | 2009

An Empirical Study of Semi-supervised Structured Conditional Models for Dependency Parsing

Jun Suzuki; Hideki Isozaki; Xavier Carreras; Michael Collins

This paper describes an empirical study of high-performance dependency parsers based on a semi-supervised learning approach. We describe an extension of semi-supervised structured conditional models (SS-SCMs) to the dependency parsing problem, whose framework is originally proposed in (Suzuki and Isozaki, 2008). Moreover, we introduce two extensions related to dependency parsing: The first extension is to combine SS-SCMs with another semi-supervised approach, described in (Koo et al., 2008). The second extension is to apply the approach to second-order parsing models, such as those described in (Carreras, 2007), using a two-stage semi-supervised learning approach. We demonstrate the effectiveness of our proposed methods on dependency parsing experiments using two widely used test collections: the Penn Treebank for English, and the Prague Dependency Tree-bank for Czech. Our best results on test data in the above datasets achieve 93.79% parent-prediction accuracy for English, and 88.05% for Czech.

international conference on computational linguistics | 2002

Extracting important sentences with support vector machines

Tsutomu Hirao; Hideki Isozaki; Eisaku Maeda; Yuji Matsumoto

Extracting sentences that contain important information from a document is a form of text summarization. The technique is the key to the automatic generation of summaries similar to those written by humans. To achieve such extraction, it is important to be able to integrate heterogeneous pieces of information. One approach, parameter tuning by machine learning, has been attracting a lot of attention. This paper proposes a method of sentence extraction based on Support Vector Machines (SVMs). To confirm the methods performance, we conduct experiments that compare our method to three existing methods. Results on the Text Summarization Challenge (TSC) corpus show that our method offers the highest accuracy. Moreover, we clarify the different features effective for extracting different document genres.

meeting of the association for computational linguistics | 2006

Left-to-Right Target Generation for Hierarchical Phrase-Based Translation

Taro Watanabe; Hajime Tsukada; Hideki Isozaki

We present a hierarchical phrase-based statistical machine translation in which a target sentence is efficiently generated in left-to-right order. The model is a class of synchronous-CFG with a Greibach Normal Form-like structure for the projected production rule: The paired target-side of a production rule takes a phrase prefixed form. The decoder for the target-normalized form is based on an Early-style top down parser on the source side. The target-normalized form coupled with our top down parser implies a left-to-right generation of translations which enables us a straightforward integration with ngram language models. Our model was experimented on a Japanese-to-English newswire translation task, and showed statistically significant performance improvements against a phrase-based translation system.

meeting of the association for computational linguistics | 2005

Boosting-based Parse Reranking with Subtree Features

Taku Kudo; Jun Suzuki; Hideki Isozaki

This paper introduces a new application of boosting for parse reranking. Several parsers have been proposed that utilize the all-subtrees representation (e.g., tree kernel and data oriented parsing). This paper argues that such an all-subtrees representation is extremely redundant and a comparable accuracy can be achieved using just a small set of subtrees. We show how the boosting algorithm can be applied to the all-subtrees representation and how it selects a small and relevant feature set efficiently. Two experiments on parse reranking show that our method achieves comparable or even better performance than kernel methods and also improves the testing efficiency.

meeting of the association for computational linguistics | 2006

Training Conditional Random Fields with Multivariate Evaluation Measures

Jun Suzuki; Erik McDermott; Hideki Isozaki

This paper proposes a framework for training Conditional Random Fields (CRFs) to optimize multivariate evaluation measures, including non-linear measures such as F-score. Our proposed framework is derived from an error minimization approach that provides a simple solution for directly optimizing any evaluation measure. Specifically focusing on sequential segmentation tasks, i.e. text chunking and named entity recognition, we introduce a loss function that closely reflects the target evaluation measure for these tasks, namely, segmentation F-score. Our experiments show that our method performs better than standard CRF training.

meeting of the association for computational linguistics | 2001

Japanese Named Entity Recognition based on a Simple Rule Generator and Decision Tree Learning

Hideki Isozaki

Named entity (NE) recognition is a task in which proper nouns and numerical information in a document are detected and classified into categories such as person, organization, location, and date. NE recognition plays an essential role in information extraction systems and question answering systems. It is well known that hand-crafted systems with a large set of heuristic rules are difficult to maintain, and corpus-based statistical approaches are expected to be more robust and require less human intervention. Several statistical approaches have been reported in the literature. In a recent Japanese NE workshop, a maximum entropy (ME) system outperformed decision tree systems and most hand-crafted systems. Here, we propose an alternative method based on a simple rule generator and decision tree learning. Our experiments show that its performance is comparable to the ME approach. We also found that it can be trained more efficiently with a large set of training data and that it improves readability.

meeting of the association for computational linguistics | 2004

Convolution Kernels with Feature Selection for Natural Language Processing Tasks

Jun Suzuki; Hideki Isozaki; Eisaku Maeda

Convolution kernels, such as sequence and tree kernels, are advantageous for both the concept and accuracy of many natural language processing (NLP) tasks. Experiments have, however, shown that the over-fitting problem often arises when these kernels are used in NLP tasks. This paper discusses this issue of convolution kernels, and then proposes a new approach based on statistical feature selection that avoids this issue. To enable the proposed method to be executed efficiently, it is embedded into an original kernel calculation process by using sub-structure mining algorithms. Experiments are undertaken on real NLP tasks to confirm the problem with a conventional method and to compare its performance with that of the proposed method.

empirical methods in natural language processing | 2003

Japanese zero pronoun resolution based on ranking rules and machine learning

Hideki Isozaki; Tsutomu Hirao

Anaphora resolution is one of the most important research topics in Natural Language Processing. In English, overt pronouns such as she and definite noun phrases such as the company are anaphors that refer to preceding entities (antecedents). In Japanese, anaphors are often omitted, and these omissions are called zero pronouns. There are two major approaches to zero pronoun resolution: the heuristic approach and the machine learning approach. Since we have to take various factors into consideration, it is difficult to find a good combination of heuristic rules. Therefore, the machine learning approach is attractive, but it requires a large amount of training data. In this paper, we propose a method that combines ranking rules and machine learning. The ranking rules are simple and effective, while machine learning can take more factors into account. From the results of our experiments, this combination gives better performance than either of the two previous approaches.

international conference on acoustics, speech, and signal processing | 2003

Deriving disambiguous queries in a spoken interactive ODQA system

Chiori Hori; Takaaki Hori; Hideki Isozaki; Eisaku Maeda; Shigeru Katagiri; Sadaoki Furui

Recently, open-domain question answering (ODQA) systems that extract an exact answer from large text corpora based on text input are intensively being investigated. However, the information in the first question input by a user is not usually enough to yield the desired answer. Interactions for collecting additional information to accomplish QA is needed. This paper proposes an interactive approach for spoken interactive ODQA systems. When the reliabilities for answer hypotheses obtained by an ODQA system are low, the system automatically derives disambiguous queries (DQ) that draw out additional information. The additional information based on the DQ should contribute to distinguishing effectively an exact answer and to supplementing a lack of information by recognition errors. In our spoken interactive ODQA system, SPIQA, spoken questions are recognized by an ASR system, and DQ are automatically generated to disambiguate the transcribed questions. We confirmed the appropriateness of the derived DQ by comparing them with manually prepared ones.

Explore More