Masato Hagiwara
Nagoya University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Masato Hagiwara.
meeting of the association for computational linguistics | 2008
Masato Hagiwara
Distributional similarity has been widely used to capture the semantic relatedness of words in many NLP tasks. However, various parameters such as similarity measures must be hand-tuned to make it work effectively. Instead, we propose a novel approach to synonym identification based on supervised learning and distributional features, which correspond to the commonality of individual context types shared by word pairs. Considering the integration with pattern-based features, we have built and compared five synonym classifiers. The evaluation experiment has shown a dramatic performance increase of over 120% on the F-1 measure basis, compared to the conventional similarity-based classification. On the other hand, the pattern-based features have appeared almost redundant.
international joint conference on natural language processing | 2005
Masato Hagiwara; Yasuhiro Ogawa; Katsuhiko Toyama
When acquiring synonyms from large corpora, it is important to deal not only with such surface information as the context of the words but also their latent semantics. This paper describes how to utilize a latent semantic model PLSI to acquire synonyms automatically from large corpora. PLSI has been shown to achieve a better performance than conventional methods such as tf·idf and LSI, making it applicable to automatic thesaurus construction. Also, various PLSI techniques have been shown to be effective including: (1) use of Skew Divergence as a distance/similarity measure; (2) removal of words with low frequencies, and (3) multiple executions of PLSI and integration of the results.
international conference on computational linguistics | 2008
Nobuyuki Shimizu; Masato Hagiwara; Yasuhiro Ogawa; Katsuhiko Toyama; Hiroshi Nakagawa
The distance or similarity metric plays an important role in many natural language processing (NLP) tasks. Previous studies have demonstrated the effectiveness of a number of metrics such as the Jaccard coefficient, especially in synonym acquisition. While the existing metrics perform quite well, to further improve performance, we propose the use of a supervised machine learning algorithm that fine-tunes them. Given the known instances of similar or dissimilar words, we estimated the parameters of the Mahalanobis distance. We compared a number of metrics in our experiments, and the results show that the proposed metric has a higher mean average precision than other metrics.
New Frontiers in Artificial Intelligence | 2009
Masato Hagiwara; Yasuhiro Ogawa; Katsuhiko Toyama
Recent demands for translating Japanese statutes into foreign languages necessitate the compilation of standard bilingual dictionaries. To support this costly task, we propose a bootstrapping-based lexical knowledge extraction algorithm Monaka , to automatically extract dictionary term candidates from unsegmented Japanese legal text. The algorithm is based on the Tchai algorithm and extracts reliable patterns and instances in an iterative manner, but instead uses character n -grams as contextual patterns, and introduces a special constraint to ensure proper segmentation of the extracted terms. The experimental results show that this algorithm can extract correctly segmented and important dictionary terms with higher accuracy compared to conventional methods.
international joint conference on natural language processing | 2015
Ayah Zirikly; Masato Hagiwara
We propose an approach to cross-lingual named entity recognition model transfer without the use of parallel corpora. In addition to global de-lexicalized features, we introduce multilingual gazetteers that are generated using graph propagation, and cross-lingual word representation mappings without the use of parallel data. We target the e-commerce domain, which is challenging due to its unstructured and noisy nature. The experiments have shown that our approaches beat the strong MT baseline, where the English model is transferred to two languages: Spanish and Chinese.
north american chapter of the association for computational linguistics | 2009
Masato Hagiwara; Hisami Suzuki
We propose a unified approach to web search query alterations in Japanese that is not limited to particular character types or orthographic similarity between a query and its alteration candidate. Our model is based on previous work on English query correction, but makes some crucial improvements: (1) we augment the query-candidate list to include orthographically dissimilar but semantically similar pairs; and (2) we use kernel-based lexical semantic similarity to avoid the problem of data sparseness in computing query-candidate similarity. We also propose an efficient method for generating query-candidate pairs for model training and testing. We show that the proposed method achieves about 80% accuracy on the query alteration task, improving over previously proposed methods that use semantic similarity.
natural language processing and knowledge engineering | 2009
Yusuke Arai; Masato Hagiwara; Yasuhiro Ogawa; Katsuhiko Toyama
We propose a dependency visualization method for understanding context, KWISC. KWISC displays dependency between bunsetsus in Japanese sentences. In particular, it splits sentences into bunsetsus and displays them hierarchically according to the depth of dependency, to understand context. In addition, KWISC is able to align two keywords respectively, while KWIC aligns one keyword. This helps to find collocations among words. Furthermore, KWISC is able to expand and collapse bunsetsus. It can shorten distances of collocating words since it shows only the main structure of sentences by collapsing the bunsetsus. Therefore, collocating words are easy to fit on the screen, and horizontal eye movement is decreased when we look for collocations. As an evaluation experiment, we have collected pairs consisting of a bunsetsu that includes a keyword and another bunsetsu on which it depends from 207,802 sentences in the EDR corpus, and have measured the distances between them in KWISC. As a result, it is confirmed that KWISC makes the distances shorter than the conventional methods, and we have shown that KWISC makes it easy to find collocations.
international joint conference on natural language processing | 2011
Graham Neubig; Yuichiroh Matsubayashi; Masato Hagiwara; Koji Murakami
meeting of the association for computational linguistics | 2006
Masato Hagiwara; Yasuhiro Ogawa; Katsuhiko Toyama
Journal of Information Processing | 2009
Masato Hagiwara; Yasuhiro Ogawa; Katsuhiko Toyama