Is this you? Create Your Porfile

Toshiyuki Kanamaru

National Institute of Information and Communications Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Toshiyuki Kanamaru is active.

Explore More

Publication

Featured researches published by Toshiyuki Kanamaru.

international conference natural language processing | 2008

Extraction and visualization of numerical and named entity information from a large number of documents

Masaki Murata; Qing Ma; Kentaro Torisawa; Masakazu Iwatate; Tamotsu Shirado; Koji Ichii; Toshiyuki Kanamaru

We have developed a system that can semi automatically extract numerical and named entity sets from a large number of Japanese documents and can create various kinds of tables and graphs. In our experiments, our system has semiautomatically created approximately 300 kinds of graphs and tables at precisions of 0.2-0.8 with only two hours of manual preparation from a two-year stack of newspapers articles. Note that these newspaper articles contained a large quantity of data, and all of them could not be read or checked manually in such a short amount of time. From this perspective, we concluded that our system is useful and convenient for extracting information from a large number of documents.

language resources and evaluation | 2007

Japanese-to-English translations of tense, aspect, and modality using machine-learning methods and comparison with machine-translation systems on market

Masaki Murata; Qing Ma; Kiyotaka Uchimoto; Toshiyuki Kanamaru; Hitoshi Isahara

This paper describes experiments carried out utilizing a variety of machine-learning methods (the k-nearest neighborhood, decision list, maximum entropy, and support vector machine), and using six machine-translation (MT) systems available on the market for translating tense, aspect, and modality. We found that all these, including the simple string-matching-based k-nearest neighborhood used in a previous study, obtained higher accuracy rates than the MT systems currently available on the market. We also found that the support vector machine obtained the best accuracy rates (98.8%) of these methods. Finally, we analyzed errors against the machine-learning methods and commercially available MT systems and obtained error patterns that should be useful for making future improvements.

international conference on computational linguistics | 2005

Automatic synonym acquisition based on matching of definition sentences in multiple dictionaries

Masaki Murata; Toshiyuki Kanamaru; Hitoshi Isahara

Studies on paraphrasing are important with respect to various research topics such as sentence generation, summarization, and question-answering. We consider the automatic extraction of synonyms (which are a kind of paraphrase) through the matching of word definitions from two dictionaries, and describe a new method for extracting paraphrases. Higher precision was obtained than with a conventional frequency-based method. The new method provided a precision rate of 0.764 for the top 500 data pairs and 0.220 for 500 randomly extracted data pairs when only synonyms were considered a correct answer. It provided a precision rate of 0.974 for the top 500 data pairs and 0.722 for 500 randomly extracted data pairs when hypernyms and similar expressions were also considered correct answers. Our method should be useful for other studies on paraphrase extraction.

international conference natural language processing | 2008

Analysis of the degree of importance of information using newspapers and questionnaires

Masaki Murata; Toshiyuki Kanamaru; Ryo Nishimura; Kentaro Torisawa; Kouichi Doi

Our objective is to estimate and clarify the factors that determine the degree of importance of information by extracting the words that characterize the degree of importance and to construct a system for automatically estimating this degree of importance. We studied the degree of importance of information by using machine learning. We first performed experiments using newspaper documents (Dn). In this experiment, we assumed that a document on the front page or at the top of the front page is important. We were able to identify important documents with a precision of 0.9 by using machine learning. We found that in the case of a newspaper, the degree of importance can be estimated with high precision. Next, to estimate the degree of importance that people attach to a document, we conducted experiments using questionnaire data (Dq) as test data. In these experiments, the subjects were asked to identify which document from a pair was more important, and a high accuracy of 94% was obtained with more than 80% of them responding with the same answer. Furthermore, on using newspaper documents (Dn) as training data, we could obtain (i) the same accuracy by using Dn only instead of using Dn with Dq and (ii) a higher accuracy on using Dn and Dq instead of using Dq only. This observation is useful because preparing questionnaire data (Dq) can be an expensive process, whereas (Dn) is free. Finally, we extracted the characteristic words that differentiated important information from less important information by calculating the parameters of the features in machine learning (maximum entropy (ME) method).

meeting of the association for computational linguistics | 2006

Machine-Learning-Based Transformation of Passive Japanese Sentences into Active by Separating Training Data into Each Input Particle

Masaki Murata; Toshiyuki Kanamaru; Tamotsu Shirado; Hitoshi Isahara

We developed a new method of transforming Japanese case particles when transforming Japanese passive sentences into active sentences. It separates training data into each input particle and uses machine learning for each particle. We also used numerous rich features for learning. Our method obtained a high rate of accuracy (94.30%). In contrast, a method that did not separate training data for any input particles obtained a lower rate of accuracy (92.00%). In addition, a method that did not have many rich features for learning used in a previous study (Murata and Isahara, 2003) obtained a much lower accuracy rate (89.77%). We confirmed that these improvements were significant through a statistical test. We also conducted experiments utilizing traditional methods using verb dictionaries and manually prepared heuristic rules and confirmed that our method obtained much higher accuracy rates than traditional methods.

Ampersand | 2015