Is this you? Create Your Porfile

Jong-Hoon Oh

National Institute of Information and Communications Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jong-Hoon Oh is active.

Explore More

Publication

Featured researches published by Jong-Hoon Oh.

Journal of Artificial Intelligence Research | 2006

A comparison of different machine transliteration models

Jong-Hoon Oh; Key-Sun Choi; Hitoshi Isahara

Machine transliteration is a method for automatically converting words in one language into phonetically equivalent ones in another language. Machine transliteration plays an important role in natural language applications such as information retrieval and machine translation, especially for handling proper nouns and technical terms. Four machine transliteration models - grapheme-based transliteration model, phoneme-based transliteration model, hybrid transliteration model, and correspondence-based transliteration model - have been proposed by several researchers. To date, however, there has been little research on a framework in which multiple transliteration models can operate simultaneously. Furthermore, there has been no comparison of the four models within the same framework and using the same data. We addressed these problems by 1) modeling the four models within the same framework, 2) comparing them under the same conditions, and 3) developing a way to improve machine transliteration through this comparison. Our comparison showed that the hybrid and correspondence-based models were the most effective and that the four models can be used in a complementary manner to improve machine transliteration performance.

web intelligence | 2008

Enriching Multilingual Language Resources by Discovering Missing Cross-Language Links in Wikipedia

Jong-Hoon Oh; Daisuke Kawahara; Kiyotaka Uchimoto; Jun’ichi Kazama; Kentaro Torisawa

We present a novel method for discovering missing cross-language links between English and Japanese Wikipedia articles. We collect candidates of missing cross-language links -- a pair of English and Japanese Wikipedia articles, which could be connected by cross-language links. Then we select the correct cross-language links among the candidates by using a classifier trained with various types of features. Our method has three desirable characteristics for discovering missing links. First, our method can discover cross-language links with high accuracy (92\% precision with 78\% recall rates). Second, the features used in a classifier are language-independent. Third, without relying on any external knowledge, we generate the features based on resources automatically obtained from Wikipedia. In this work, we discover approximately

ACM Transactions on Asian Language Information Processing | 2006

A machine transliteration model based on correspondence between graphemes and phonemes

Jong-Hoon Oh; Key-Sun Choi; Hitoshi Isahara

10^5

international joint conference on natural language processing | 2009

Bilingual Co-Training for Monolingual Hyponymy-Relation Acquisition

Jong-Hoon Oh; Kiyotaka Uchimoto; Kentaro Torisawa

missing cross-language links from Wikipedia, which are almost two-thirds as many as the existing cross-language links in Wikipedia.

international universal communication symposium | 2010

Generating information-rich taxonomy from Wikipedia

Ichiro Yamada; Chikara Hashimoto; Jong-Hoon Oh; Kentaro Torisawa; Kow Kuroda; Stijn De Saeger; Masaaki Tsuchida; Jun’ichi Kazama

Machine transliteration is an automatic method for converting words in one language into phonetically equivalent ones in another language. There has been growing interest in the use of machine transliteration to assist machine translation and information retrieval. Three types of machine transliteration models---grapheme-based, phoneme-based, and hybrid---have been proposed. Surprisingly, there have been few reports of efforts to utilize the correspondence between source graphemes and source phonemes, although this correspondence plays an important role in machine transliteration. Furthermore, little work has been reported on ways to dynamically handle source graphemes and phonemes. In this paper, we propose a transliteration model that dynamically uses both graphemes and phonemes, particularly the correspondence between them. With this model, we have achieved better performance---improvements of about 15 to 41% in English-to-Korean transliteration and about 16 to 44% in English-to-Japanese transliteration---than has been reported for other models.

empirical methods in natural language processing | 2015

Intra-sentential Zero Anaphora Resolution using Subject Sharing Recognition

Ryu Iida; Kentaro Torisawa; Chikara Hashimoto; Jong-Hoon Oh; Julien Kloetzer

This paper proposes a novel framework called bilingual co-training for a large-scale, accurate acquisition method for monolingual semantic knowledge. In this framework, we combine the independent processes of monolingual semantic-knowledge acquisition for two languages using bilingual resources to boost performance. We apply this framework to large-scale hyponymy-relation acquisition from Wikipedia. Experimental results show that our approach improved the F-measure by 3.6--10.3%. We also show that bilingual co-training enables us to build classifiers for two languages in tandem with the same combined amount of data as required for training a single classifier in isolation while achieving superior performance.

international conference on the computer processing of oriental languages | 2006

Improving machine transliteration performance by using multiple transliteration models

Jong-Hoon Oh; Key-Sun Choi; Hitoshi Isahara

Even though hyponymy relation acquisition has been extensively studied, “how informative such acquired hyponymy relations are” has not been sufficiently discussed. We found that the hypernyms in automatically acquired hyponymy relations were often too vague or ambiguous to specify the meaning of their hyponyms. For instance, hypernym work is vague and ambiguous in hyponymy relations work/Avatar and work/The Catcher in the Rye. In this paper, we propose a simple method of generating intermediate concepts of hyponymy relations that can make such (vague) hypernyms more specific. Our method generates such an information-rich hyponymy relation as work / work by film director / work by James Cameron / Avatar from the less informative relation work/Avatar. Furthermore, the generated relation work by film director/Avatar can be paraphrased into a new relation movie/Avatar. Experiments showed that our method successfully acquired 2,719,441 enriched hyponymy relations with one intermediate concept with 0.853 precision and another 6,347,472 hyponymy relations with 0.786 precision.

empirical methods in natural language processing | 2009

Can Chinese Phonemes Improve Machine Transliteration?: A Comparative Study of English-to-Chinese Transliteration Models

Jong-Hoon Oh; Kiyotaka Uchimoto; Kentaro Torisawa

In this work, we improve the performance of intra-sentential zero anaphora resolution in Japanese using a novel method of recognizing subject sharing relations. In Japanese, a large portion of intrasentential zero anaphora can be regarded as subject sharing relations between predicates, that is, the subject of some predicate is also the unrealized subject of other predicates. We develop an accurate recognizer of subject sharing relations for pairs of predicates in a single sentence, and then construct a subject shared predicate network, which is a set of predicates that are linked by the subject sharing relations recognized by our recognizer. We finally combine our zero anaphora resolution method exploiting the subject shared predicate network and a state-ofthe-art ILP-based zero anaphora resolution method. Our combined method achieved a significant improvement over the the ILPbased method alone on intra-sentential zero anaphora resolution in Japanese. To the best of our knowledge, this is the first work to explicitly use an independent subject sharing recognizer in zero anaphora resolution.

IEICE Transactions on Information and Systems | 2006

An Alignment Model for Extracting English-Korean Translations of Term Constituents

Jong-Hoon Oh; Key-Sun Choi; Hitoshi Isahara

Machine transliteration has received significant attention as a supporting tool for machine translation and cross-language information retrieval. During the last decade, four kinds of transliteration model have been studied — grapheme-based model, phoneme-based model, hybrid model, and correspondence-based model. These models are classified in terms of the information sources for transliteration or the units to be transliterated — source graphemes, source phonemes, both source graphemes and source phonemes, and the correspondence between source graphemes and phonemes, respectively. Although each transliteration model has shown relatively good performance, one model alone has limitations on handling complex transliteration behaviors. To address the problem, we combined different transliteration models with a “generating transliterations followed by their validation” strategy. The strategy makes it possible to consider complex transliteration behaviors using the strengths of each model and to improve transliteration performance by validating transliterations. Our method makes use of web-based and transliteration model-based validation for transliteration validation. Experiments showed that our method outperforms both the individual transliteration models and previous work.

empirical methods in natural language processing | 2012

Excitatory or Inhibitory: A New Semantic Orientation Extracts Contradiction and Causality from the Web

Chikara Hashimoto; Kentaro Torisawa; Stijn De Saeger; Jong-Hoon Oh; Jun’ichi Kazama

Inspired by the success of English grapheme-to-phoneme research in speech synthesis, many researchers have proposed phoneme-based English-to-Chinese transliteration models. However, such approaches have severely suffered from the errors in Chinese phoneme-to-grapheme conversion. To address this issue, we propose a new English-to-Chinese transliteration model and make systematic comparisons with the conventional models. Our proposed model relies on the joint use of Chinese phonemes and their corresponding English graphemes and phonemes. Experiments showed that Chinese phonemes in our proposed model can contribute to the performance improvement in English-to-Chinese transliteration.

Explore More