Hitoshi Isahara
Ministry of Posts and Telecommunications
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hitoshi Isahara.
conference of the european chapter of the association for computational linguistics | 1999
Kiyotaka Uchimoto; Satoshi Sekine; Hitoshi Isahara
This paper describes a dependency structure analysis of Japanese sentences based on the maximum entropy models. Our model is created by learning the weights of some features from a training corpus to predict the dependency between bunsetsus or phrasal units. The dependency accuracy of our system is 87.2% using the Kyoto University corpus. We discuss the contribution of each feature set and the relationship between the number of training data and the accuracy.
ACM Transactions on Asian Language Information Processing | 2002
Masaki Murata; Qing Ma; Hitoshi Isahara
The elastic-input neuro-tagger and hybrid tagger, combined with a neural network and Brills error-driven learning, have already been proposed to construct a practical tagger using as little training data as possible. When a small Thai corpus is used for training, these taggers have tagging accuracies of, respectively, 94.4% and 95.5% (accounting only for the ambiguous words that relate to the parts of speech). In this study, in order to construct more accurate taggers, we developed new tagging methods using three different machine-learning approaches: the decision list, maximum entropy, and the support vector machine methods. We then performed tagging experiments using them. Our results show that the support vector machine method has the best precision (96.1%), and that it is capable of improving the accuracy of tagging in the Thai language. The improvement in accuracy was also confirmed by using a statistical test (a sign test). Finally, we examined theoretically all these methods in an effort to determine how the improvements were achieved. We found that the improvements were due to our use of word information, which is helpful for tagging, and a support vector machine that performed well.
international conference on computational linguistics | 2000
Kiyotaka Uchimoto; Masaki Murata; Qing Ma; Satoshi Sekine; Hitoshi Isahara
In this paper we describe a method of acquiring word order from corpora. Word order is defined as the order of modifiers, or the order of phrasal units called bunsetsu which depend on the same modifiee. The method uses a model which automatically discovers what the tendency of the word order in Japanese is by using various kinds of information in and around the target bunsetsus. This model shows us to what extent each piece of information contributes to deciding the word order and which word order tends to be selected when several kinds of information conflict. The contribution rate of each piece of information in deciding word order is efficiently learned by a model within a maximum entropy framework. The performance of this trained model can be evaluated by checking how many instances of word order selected by the model agree with those in the original text. In this paper, we show that even a raw corpus that has not been tagged can be used to train the model, if it is first analyzed by a parser. This is possible because the word order of the text in the corpus is correct.
international symposium on neural networks | 1999
Qing Ma; Kiyotaka Uchimoto; Masaki Murata; Hitoshi Isahara
This paper presents a part of speech (POS) neuro tagger which consists of a 3-layer perceptron with elastic input. Computer experiments show that the neuro tagger has an accuracy of 94.4% for tagging ambiguous words when a small Thai corpus with 22,311 ambiguous words is used for training. A series of comparative experiments further show that the neuro tagger is definitely far superior to the statistical models including the frequency model (a base-line model), local n-gram model, and HMM.
international conference on computational linguistics | 2000
Qing Ma; Masaki Murata; Kiyotaka Uchimoto; Hitoshi Isahara
A hybrid system for tagging part of speech is described that consists of a neuro tagger and a rule-based corrector. The neuro tagger is an initial-state annotator that uses different lengths of context based on longest context priority. Its inputs are weighted by information gains that are obtained by information maximization. The rule-based corrector is constructed by a set of transformation rules to make up for the shortcomings of the neuro tagger. Computer experiments show that almost 20% of the errors made by the neuro tagger are corrected by these transformation rules, so that the hybrid system can reach an accuracy of 95.5% counting only the ambiguous words and 99.1% counting all words when a small Thai corpus with 22,311 ambiguous words is used for training. This accuracy is far higher than that using an HMM and is also higher than that using a rule-based model.
acm symposium on applied computing | 2000
Hiromi Ozaku; Kiyotaka Uchimoto; Masaki Murata; Hitoshi Isahara
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC ’00 Villa Olmo, Como Italy Copyright 2000 ACM 0-89791-88-6/97/05 ..
international conference on computational linguistics | 2000
Masaki Murata; Kiyotaka Uchimoto; Qing Ma; Hitoshi Isahara
5.00 strewn around a lot of news groups. There is, nonetheless, a lot of useful information in network news, and extracting necessary information from network news can be crucial to users. We have thus started to build
international conference on artificial neural networks | 1996
Qing Ma; Hitoshi Isahara; Hiromi Ozaku
This paper describes two new bunsetsu identification methods using supervised learning. Since Japanese syntactic analysis is usually done after bunsetsu identification, bunsetsu identification is important for analyzing Japanese sentences. In experiments comparing the four previously available machine-learning methods (decision tree, maximum-entropy method, example-based approach and decision list) and two new methods using category-exclusive rules, the new method using the category-exclusive rules with the highest similarity performed best.
The Journal of The Acoustical Society of Japan (e) | 1999
Virach Sornlertlamvanich; Naoto Takahashi; Hitoshi Isahara
A cost-effective method for Part-Of-Speech (POS) tagging of a Thai corpus using neural networks is proposed. Computer experiments show that this method has a success rate of over 80% for tagging text of untrained data, and an error rate below 8%. These results are much better than those obtained by conventional table lookup methods. Some experiments comparing original and various modified back-propagation algorithms for training the neural network tagger are also conducted. Results of these experiments show that the learning algorithm with DBDB adaptation rule at a semi-batch update mode is the best one for tagging text in terms of convergence rate and computaional complexity.
Journal of Natural Language Processing | 1999
Qing Ma; Hitoshi Isahara
Collaboration
Dive into the Hitoshi Isahara's collaboration.
National Institute of Information and Communications Technology
View shared research outputs