Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Key-Sun Choi is active.

Publication


Featured researches published by Key-Sun Choi.


international conference on computational linguistics | 2002

An English-Korean transliteration model using pronunciation and contextual rules

Jong-Hoon Oh; Key-Sun Choi

There is increasing concern about English-Korean (E-K) transliteration recently. In the previous works, direct converting methods from English alphabets to Korean alphabets were a main research topic. In this paper, we present an E-K transliteration model using pronunciation and contextual rules. Unlike the previous works, our method uses phonetic information such as phoneme and its context. We also use word formation information such as English words of Greek origin, With them, our method shows significant performance increase about 31% in word accuracy.


Information Processing and Management | 2001

Re-ranking model based on document clusters

Kyung-Soon Lee; Yc Park; Key-Sun Choi

Abstract In this paper, we describe a model of information retrieval system that is based on a document re-ranking method using document clusters. In the first step, we retrieve documents based on the inverted-file method. Next, we analyze the retrieved documents using document clusters, and re-rank them. In this step, we use static clusters and dynamic cluster view. Consequently, we can produce clusters that are tailored to characteristics of the query. We focus on the merits of the inverted-file method and cluster analysis. In other words, we retrieve documents based on the inverted-file method and analyze all terms in document based on the cluster analysis. By these two steps, we can get the retrieved results which are made by the consideration of the context of all terms in a document as well as query terms. We will show that our method achieves significant improvements over the method based on similarity search ranking alone.


Information Processing and Management | 2006

Incremental cue phrase learning and bootstrapping method for causality extraction using cue phrase and word pair probabilities

Du-Seong Chang; Key-Sun Choi

This work aims to extract possible causal relations that exist between noun phrases. Some causal relations are manifested by lexical patterns like causal verbs and their sub-categorization. We use lexical patterns as a filter to find causality candidates and we transfer the causality extraction problem to the binary classification. To solve the problem, we introduce probabilities for word pair and concept pair that could be part of causal noun phrase pairs. We also use the cue phrase probability that could be a causality pattern. These probabilities are learned from the raw corpus in an unsupervised manner. With this probabilistic model, we increase both precision and recall. Our causality extraction shows an F-score of 77.37%, which is an improvement of 21.14 percentage points over the baseline model. The long distance causal relation is extracted with the binary tree-styled cue phrase. We propose an incremental cue phrase learning method based on the cue phrase confidence score that was measured after each causal classifier learning step. A better recall of 15.37 percentage points is acquired after the cue phrase learning.


international joint conference on natural language processing | 2004

Causal relation extraction using cue phrase and lexical pair probabilities

Du-Seong Chang; Key-Sun Choi

This work aims to extract causal relations that exist between two events expressed by noun phrases or sentences. The previous works for the causality made use of causal patterns such as causal verbs. We concentrate on the information obtained from other causal event pairs. If two event pairs share some lexical pairs and one of them is revealed to be causally related, the causal probability of another event pair tends to increase. We introduce the lexical pair probability and the cue phrase probability. These probabilities are learned from raw corpus in unsupervised manner. With these probabilities and the Naive Bayes classifier, we try to resolve the causal relation extraction problem. Our inter-NP causal relation extraction shows the precision of 81.29%, that is 7.05% improvement over the baseline model. The proposed models are also applied to inter-sentence causal relation extraction.


Journal of Artificial Intelligence Research | 2006

A comparison of different machine transliteration models

Jong-Hoon Oh; Key-Sun Choi; Hitoshi Isahara

Machine transliteration is a method for automatically converting words in one language into phonetically equivalent ones in another language. Machine transliteration plays an important role in natural language applications such as information retrieval and machine translation, especially for handling proper nouns and technical terms. Four machine transliteration models - grapheme-based transliteration model, phoneme-based transliteration model, hybrid transliteration model, and correspondence-based transliteration model - have been proposed by several researchers. To date, however, there has been little research on a framework in which multiple transliteration models can operate simultaneously. Furthermore, there has been no comparison of the four models within the same framework and using the same data. We addressed these problems by 1) modeling the four models within the same framework, 2) comparing them under the same conditions, and 3) developing a way to improve machine transliteration through this comparison. Our comparison showed that the hybrid and correspondence-based models were the most effective and that the four models can be used in a complementary manner to improve machine transliteration performance.


Information Processing and Management | 1996

Automatic thesaurus construction using Bayesian networks

Young Choon Park; Key-Sun Choi

Automatic thesaurus construction is accomplished by extracting term relations mechanically. A popular method uses statistical analysis to discover the term relations. For low-frequency terms, however, the statistical information of the terms cannot be reliably used for deciding the relationship of terms. This problem is generally referred to as the data-sparseness problem. Unfortunately, many studies have shown that low-frequency terms are of most use in thesaurus construction. This paper characterizes the statistical behavior of terms by using an inference network. A formal approach for the data-sparseness problem, which is crucial in constructing a thesaurus, is developed. The validity of this approach is shown by experiments.


Information Processing and Management | 2007

Patent document categorization based on semantic structural information

Jae-Ho Kim; Key-Sun Choi

The number of patent documents is currently rising rapidly worldwide, creating the need for an automatic categorization system to replace time-consuming and labor-intensive manual categorization. Because accurate patent classification is crucial to search for relevant existing patents in a certain field, patent categorization is a very important and useful field. As patent documents are structural documents with their own characteristics distinguished from general documents, these unique traits should be considered in the patent categorization process. In this paper, we categorize Japanese patent documents automatically, focusing on their characteristics: patents are structured by claims, purposes, effects, embodiments of the invention, and so on. We propose a patent document categorization method that uses the k-NN (k-Nearest Neighbour) approach. In order to retrieve similar documents from a training document set, some specific components to denote the so-called semantic elements, such as claim, purpose, and application field, are compared instead of the whole texts. Because those specific components are identified by various user-defined tags, first all of the components are clustered into several semantic elements. Such semantically clustered structural components are the basic features of patent categorization. We can achieve a 74% improvement of categorization performance over a baseline system that does not use the structural information of the patent.


Information Processing and Management | 2006

An ensemble of transliteration models for information retrieval

Jong-Hoon Oh; Key-Sun Choi

Transliteration is used to phonetically translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from English to Korean, Japanese, and Chinese. Because transliterations are usually representative index terms for documents, proper handling of the transliterations is important for an effective information retrieval system. However, there are limitations on handling transliterations depending on dictionary lookup, because transliterations are usually not registered in the dictionary. For this reason, many researchers have been trying to overcome the problem using machine transliteration. In this paper, we propose a method for improving machine transliteration using an ensemble of three different transliteration models. Because one transliteration model alone has limitation on reflecting all possible transliteration behaviors, several transliteration models should be complementary used in order to achieve a high-performance machine transliteration system. This paper describes a method about transliteration production using the several machine transliteration models and transliteration ranking with web data and relevance scores given by each transliteration model. We report evaluation results for our ensemble transliteration model and experimental results for its impact on IR effectiveness. Machine transliteration tests on English-to-Korean transliteration and English-to-Japanese transliteration show that our proposed method achieves 78-80% word accuracy. Information retrieval tests on KTSET and NTCIR-1 test collection show that our transliteration model can improve the performance of an information retrieval system about 10-34%.


international joint conference on natural language processing | 2005

An ensemble of grapheme and phoneme for machine transliteration

Jong-Hoon Oh; Key-Sun Choi

Machine transliteration is an automatic method to generate characters or words in one alphabetical system for the corresponding characters in another alphabetical system. There has been increasing concern on machine transliteration as an assistant of machine translation and information retrieval. Three machine transliteration models, including “grapheme-based model”, “phoneme-based model”, and “hybrid model”, have been proposed. However, there are few works trying to make use of correspondence between source grapheme and phoneme, although the correspondence plays an important role in machine transliteration. Furthermore there are few works, which dynamically handle source grapheme and phoneme. In this paper, we propose a new transliteration model based on an ensemble of grapheme and phoneme. Our model makes use of the correspondence and dynamically uses source grapheme and phoneme. Our method shows better performance than the previous works about 15~23% in English-to-Korean transliteration and about 15~43% in English-to-Japanese transliteration.


IEEE Transactions on Software Engineering | 1992

Two-dimensional specification of universal quantification in a graphical database query language

Kyu-Young Whang; Ashok Malhotra; Gary H. Sockut; Luanne M. Burns; Key-Sun Choi

A technique is proposed for specifying universal quantification and existential quantification (combined with negation) in a two-dimensional (graphical) database query language. Unlike other approaches that provide set operators to simulate universal quantification, this technique allows a direct representation of universal quantification. Syntactic constructs for specifying universal and existential quantifications, two-dimensional translation of universal quantification to existential quantification (with negation), and translation of existentially quantified two-dimensional queries to relational queries are presented. The resulting relational queries can be processed directly by many existing database systems. The authors claim that this technique renders universal quantifications easy to understand. To substantiate this claim, they provide a simple, easy-to-follow guideline for constructing universally quantified queries. >

Collaboration


Dive into the Key-Sun Choi's collaboration.

Top Co-Authors

Avatar

Jong-Hoon Oh

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar

Kyung-Soon Lee

Chonbuk National University

View shared research outputs
Researchain Logo
Decentralizing Knowledge