Kyongho Min
Auckland University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kyongho Min.
pacific rim international conference on artificial intelligence | 2004
Hilda Ho; Kyongho Min; Wai K. Yeap
This paper describes a knowledge-poor anaphora resolution approach based on a shallow meaning representation of sentences. The structure afforded in such a representation provides immediate identification of local domains which are required for resolving pronominal anaphora. Other kinds of information used include syntactic information, structure parallelism and salience weights. We collected 111 singular 3rd person pronouns from open domain resources such as childrens novel and examples from several anaphora resolution papers. There are 111 third-person singular pronouns in the experiment data set and 94 of them demonstrate pronominal anaphora in domain of test data. The system successfully resolves 78.4% of anaphoric examples.
australasian joint conference on artificial intelligence | 2005
Kyongho Min; William H. Wilson; Yoo-Jin Moon
This paper describes the interpretation of numerals, and strings including numerals, composed of a number and words or symbols that indicate whether the string is a SPEED, LENGTH, or whatever. The interpretation is done at three levels: lexical, syntactic, and semantic. The system employs three interpretation processes: a word trigram constructor with tokeniser, a rule-based processor of number strings, and n-gram based disambiguation of meanings. We extracted numeral strings from 378 online newspaper articles, finding that, on average, they comprised about 2.2% of the words in the articles. We chose 287 of these articles to provide unseen test data (3251 numeral strings), and used the remaining 91 articles to provide 886 numeral strings for use in manually extracting n-gram constraints to disambiguate the meanings of the numeral strings. We implemented six different disambiguation methods based on category frequency statistics collected from the sample data and on the number of word trigram constraints of each category. Precision ratios for the six methods when applied to the test data ranged from 85.6% to 87.9%.
meeting of the association for computational linguistics | 1998
Kyongho Min; William H. Wilson
This paper describes a system that performs hierarchical error repair for ill-formed sentences, with heterarchical control of chart items produced at the lexical, syntactic, and semantic levels. The system uses an augmented context-free grammar and employs a bidirectional chart parsing algorithm. The system is composed of four subsystems: for lexical, syntactic, surface case, and semantic processing. The subsystems are controlled by an integrated-agenda system. The system employs a parser for well-formed sentences and a second parser for repairing single error sentences. The system ranks possible repairs by penalty scores which are based on both grammar-independent factors (e.g. the significance of the repaired constituent in a local tree) and grammar-independent factors (e.g. error types). This paper focuses on the heterarchical processing of integrated-agenda items (i.e. chart items) at three levels, in the context of single error recovery.
Ninth International Conference on Information Visualisation (IV'05) | 2005
Wai-Kiang Yeap; Paul Reedy; Kyongho Min; Hilda Ho
We implemented SmartINFO, an experimental system for the visualization of the meaning of texts. SmartINFO consists of 4 modules: a universal grammar engine (UGE), an anaphora engine, a concept engine and a visualization engine. We discuss two methods of visualizing meanings of text. One approach is a word-centered approach and the other, a clausal-centered approach.
australian joint conference on artificial intelligence | 2002
Kyongho Min; William H. Wilson; Yoo-Jin Moon
This paper describes methods of document classification for a highly inflectional/derivational language that forms monolithic compound noun terms, like Dutch and Korean. The system is composed of three phases: (1) a Korean morphological analyzer called HAM (Kang, 1993), (2) an application of compound noun phrase analysis to the result of HAM analysis and extraction of terms whose syntactic categories are noun, name (proper noun), verb, and adjective, and (3) an effective document classification algorithm based on preferred class score heuristics. This paper focuses on the comparison of document classification methods including a simple heuristic method, and preferred class score heuristics employing two factors namely ICF (inverted class frequency) and IDF (inverted document frequency) with/without term frequency weight. In addition this paper describes a simple classification approach without a learning algorithm rather than a vector space model with a complex training and classification algorithm such as cosine similarity measurement. The experimental results show 95.7% correct classifications of 720 training data and 63.8%-71.3% of randomly chosen 80 testing data through various methods.
australasian joint conference on artificial intelligence | 2007
Kyongho Min; William H. Wilson; Byeong Ho Kang
This paper describes and compares the use of methods based on N-grams (specifically trigrams and pentagrams), together with five features, to recognise the syntactic and semantic categories of numeral strings representing money, number, date, etc., in texts. The system employs three interpretation processes: word N-grams construction with a tokeniser; rule-based processing of numeral strings; and N-gram-based classification. We extracted numeral strings from 1, 111 online newspaper articles. For numeral strings interpretation, we chose 112 (10%) of 1, 111 articles to provide unseen test data (1, 278 numeral strings), and used the remaining 999 articles to provide 11, 525 numeral strings for use in extracting N-gram-based constraints to disambiguate meanings of the numeral strings. The word trigrams method resulted in 83.8% precision, 81.2% recall ratio, and 82.5% in F-measurement ratio. The word pentagrams method resulted in 86.6% precision, 82.9% recall ratio, and 84.7% in F-measurement ratio.
australian joint conference on artificial intelligence | 2006
Kyongho Min; William H. Wilson
This paper describes a performance comparison for two approaches to numeral string interpretation: manually generated rule-based interpretation of numerals and strings including numerals [8] vs automatically generated feature-based interpretation. The system employs three interpretation processes: word trigram construction with a tokeniser, rule-based processing of number strings, and n-gram based classification. We extracted numeral strings from 378 online newspaper articles, finding that, on average, they comprised about 2.2% of the words in the articles. For feature-based interpretation, we tested on 11 datasets, with random selection of sample data to extract tabular feature-based constraints. The rule-based approach resulted in 86.8% precision and 77.1% recall ratio. The feature-based interpretation resulted in 83.1% precision and 74.5% recall ratio.
intelligent data engineering and automated learning | 2003
Kyongho Min
This paper describes relationships between the document classification performance and its relevant factors for a highly inflectional language that forms monolithic compound noun terms. The factors are the number of class feature sets, the size of training or testing document, ratio of overlapping class features among 8 classes, and ratio of non-overlapping class feature sets. The system is composed of three phases: a Korean morphological analyser called HAM [11], an application of compound noun phrase analysis and extraction of terms whose syntactic categories are noun, name, verb, and adjective, and an effective document classification algorithm based on preferred class score heuristics. The best algorithm in this paper, Weighted PCSICF based on inverse class frequency, shows an inverse proportional relationship between its performance and the number of class feature sets and the number of ratio of non-overlapping class feature sets.
australasian joint conference on artificial intelligence | 2003
Kyongho Min; William H. Wilson; Yoo-Jin Moon
Unlike compound noun terms in English and French, where words are separated by white space, Korean compound noun terms are not separated by white space. In addition, some compound noun terms in the real world result from a spacing error. Thus the analysis of compound noun terms is a difficult task in Korean NLP. Systems based on probabilistic and statistical information extracted from a corpus have shown good performance on Korean compound noun analysis. However, if the domain of the actual system is expanded beyond that of the training system, then the performance on the compound noun analysis would not be consistent. In this paper, we will describe the analysis of Korean compound noun terms based on a longest substring algorithm and an agenda-based chart parsing technique, with a simple heuristic method to resolve the analyses’ ambiguities. The system successfully analysed 95.6% of the testing data (6024 compound noun terms) which ranged from 2 to 11 syllables. The average ambiguities ranged from 1 to 33 for each compound noun term.
australian joint conference on artificial intelligence | 1997
Kyongho Min; William H. Wilson
This paper describes a system that performs hierarchical error recovery, and detects and corrects a single error in a sentence at the lexical, syntactic, and/or semantic levels. If the system is unable to repair an erroneous sentence on the assumption that it has a single error, a multiple error recovery system is invoked. The system employs a chart parsing algorithm and uses an augmented context-free grammar, and has subsystems for lexical, syntactic, surface case, and semantic processing, which are controlled by an integrated-agenda system. In the frequent case that there is a choice of possible repairs, the possible repairs are ranked by penalty scores. The penalty scores are based on grammar-dependent and grammar-independent heuristics. The grammar-independent ones involve error types, and, at the lexical level, character distance; the grammar-dependent ones involve, at the syntactic level, the significance of the repaired constituent in a local tree, and, at the semantic level, the distance between the semantic form containing the error, and normal act templates. This paper focuses on single error recovery.