Jun’ichi Kazama
Japan Advanced Institute of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jun’ichi Kazama.
meeting of the association for computational linguistics | 2002
Jun’ichi Kazama; Takaki Makino; Yoshihiro Ohta; Jun’ichi Tsujii
We explore the use of Support Vector Machines (SVMs) for biomedical named entity recognition. To make the SVM training with the available largest corpus - the GENIA corpus - tractable, we propose to split the non-entity class into sub-classes, using part-of-speech information. In addition, we explore new features such as word cache and the states of an HMM trained by unsupervised learning. Experiments on the GENIA corpus show that our class splitting technique not only enables the training with the GENIA corpus but also improves the accuracy. The proposed new features also contribute to improve the accuracy. We compare our SVM-based recognition system with a system using Maximum Entropy tagging method.
empirical methods in natural language processing | 2003
Jun’ichi Kazama; Jun’ichi Tsujii
A maximum entropy (ME) model is usually estimated so that it conforms to equality constraints on feature expectations. However, the equality constraint is inappropriate for sparse and therefore unreliable features. This study explores an ME model with box-type inequality constraints, where the equality can be violated to reflect this unreliability. We evaluate the inequality ME model using text categorization datasets. We also propose an extension of the inequality ME model, which results in a natural integration with the Gaussian MAP estimation. Experimental results demonstrate the advantage of the inequality models and the proposed extension.
international conference on computational linguistics | 2008
Stijn De Saeger; Kentaro Torisawa; Jun’ichi Kazama
This paper presents a method for mining potential troubles or obstacles related to the use of a given object. Some example instances of this relation are (medicine, side effect) and (amusement park, height restriction). Our acquisition method consists of three steps. First, we use an un-supervised method to collect training samples from Web documents. Second, a set of expressions generally referring to troubles is acquired by a supervised learning method. Finally, the acquired troubles are associated with objects so that each of the resulting pairs consists of an object and a trouble or obstacle in using that object. To show the effectiveness of our method we conducted experiments using a large collection of Japanese Web documents for acquisition. Experimental results show an 85.5% precision for the top 10,000 acquired troubles, and a 74% precision for the top 10% of over 60,000 acquired object-trouble pairs.
international joint conference on natural language processing | 2005
Kosuke Tokunaga; Jun’ichi Kazama; Kentaro Torisawa
We propose a method of acquiring attribute words for a wide range of objects from Japanese Web documents. The method is a simple unsupervised method that utilizes the statistics of words, lexico-syntactic patterns, and HTML tags. To evaluate the attribute words, we also establish criteria and a procedure based on question-answerability about the candidate word.
Machine Learning | 2005
Jun’ichi Kazama; Jun’ichi Tsujii
Data sparseness or overfitting is a serious problem in natural language processing employing machine learning methods. This is still true even for the maximum entropy (ME) method, whose flexible modeling capability has alleviated data sparseness more successfully than the other probabilistic models in many NLP tasks. Although we usually estimate the model so that it completely satisfies the equality constraints on feature expectations with the ME method, complete satisfaction leads to undesirable overfitting, especially for sparse features, since the constraints derived from a limited amount of training data are always uncertain. To control overfitting in ME estimation, we propose the use of box-type inequality constraints, where equality can be violated up to certain predefined levels that reflect this uncertainty. The derived models, inequality ME models, in effect have regularized estimation with L1 norm penalties of bounded parameters. Most importantly, this regularized estimation enables the model parameters to become sparse. This can be thought of as automatic feature selection, which is expected to improve generalization performance further. We evaluate the inequality ME models on text categorization datasets, and demonstrate their advantages over standard ME estimation, similarly motivated Gaussian MAP estimation of ME models, and support vector machines (SVMs), which are one of the state-of-the-art methods for text categorization.
international conference on data mining | 2009
Stijn De Saeger; Kentaro Torisawa; Jun’ichi Kazama; Kow Kuroda; Masaki Murata
This paper proposes a minimally supervised method for acquiring high-level semantic relations such as causality and prevention from the Web. Our method learns linguistic patterns that express causality such as “x gave rise to y”, and uses them to extract causal noun pairs like (global warming, malaria epidemic) from sentences like “global warming gave rise to a new malaria epidemic”. The novelty of our method lies in the use of semantic word classes acquired by large scale clustering for learning class dependent patterns. We demonstrate the effectiveness of this class based approach on three large-scale relation mining tasks from 50 million Japanese Web pages. In two of these tasks we obtained more than 30,000 relation instances with over 80% precision, outperforming a state-of-the-art system by a large margin.
web intelligence | 2008
Jong-Hoon Oh; Daisuke Kawahara; Kiyotaka Uchimoto; Jun’ichi Kazama; Kentaro Torisawa
We present a novel method for discovering missing cross-language links between English and Japanese Wikipedia articles. We collect candidates of missing cross-language links -- a pair of English and Japanese Wikipedia articles, which could be connected by cross-language links. Then we select the correct cross-language links among the candidates by using a classifier trained with various types of features. Our method has three desirable characteristics for discovering missing links. First, our method can discover cross-language links with high accuracy (92% precision with 78% recall rates). Second, the features used in a classifier are language-independent. Third, without relying on any external knowledge, we generate the features based on resources automatically obtained from Wikipedia. In this work, we discover approximately
empirical methods in natural language processing | 2005
Jun’ichi Kazama; Kentaro Torisawa
10^5
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Wenliang Chen; Jun’ichi Kazama; Min Zhang; Yoshimasa Tsuruoka; Yujie Zhang; Yiou Wang; Kentaro Torisawa; Haizhou Li
missing cross-language links from Wikipedia, which are almost two-thirds as many as the existing cross-language links in Wikipedia.
conference on computational natural language learning | 2006
Jun’ichi Kazama; Kentaro Torisawa
We present a method for speeding up the calculation of tree kernels during training. The calculation of tree kernels is still heavy even with efficient dynamic programming (DP) procedures. Our method maps trees into a small feature space where the inner product, which can be calculated much faster, yields the same value as the tree kernel for most tree pairs. The training is sped up by using the DP procedure only for the exceptional pairs. We describe an algorithm that detects such exceptional pairs and converts trees into vectors in a feature space. We propose tree kernels on marked labeled ordered trees and show that the training of SVMs for semantic role labeling using these kernels can be sped up by a factor of several tens.
Collaboration
Dive into the Jun’ichi Kazama's collaboration.
National Institute of Information and Communications Technology
View shared research outputsNational Institute of Information and Communications Technology
View shared research outputsNational Institute of Information and Communications Technology
View shared research outputsNational Institute of Information and Communications Technology
View shared research outputs