Is this you? Create Your Porfile

Gary Geunbae Lee

Pohang University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gary Geunbae Lee is active.

Explore More

Publication

Featured researches published by Gary Geunbae Lee.

Information Processing and Management | 2006

Information gain and divergence-based feature selection for machine learning-based text categorization

Changki Lee; Gary Geunbae Lee

Most previous works of feature selection emphasized only the reduction of high dimensionality of the feature space. But in cases where many features are highly redundant with each other, we must utilize other means, for example, more complex dependence models such as Bayesian network classifiers. In this paper, we introduce a new information gain and divergence-based feature selection method for statistical machine learning-based text categorization without relying on more complex dependence models. Our feature selection method strives to reduce redundancy between features while maintaining information gain in selecting appropriate features for text categorization. Empirical results are given on a number of dataset, showing that our feature selection method is more effective than Koller and Sahamis method [Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In Proceedings of ICML-96, 13th international conference on machine learning], which is one of greedy feature selection methods, and conventional information gain which is commonly used in feature selection for text categorization. Moreover, our feature selection method sometimes produces more improvements of conventional machine learning algorithms over support vector machines which are known to give the best classification accuracy.

Speech Communication | 2009

Example-based dialog modeling for practical multi-domain dialog system

Cheongjae Lee; Sangkeun Jung; Seokhwan Kim; Gary Geunbae Lee

This paper proposes a generic dialog modeling framework for a multi-domain dialog system to simultaneously manage goal-oriented and chat dialogs for both information access and entertainment. We developed a dialog modeling technique using an example-based approach to implement multiple applications such as car navigation, weather information, TV program guidance, and chatbot. Example-based dialog modeling (EBDM) is a simple and effective method for prototyping and deploying of various dialog systems. This paper also introduces the system architecture of multi-domain dialog systems using the EBDM framework and the domain spotting technique. In our experiments, we evaluate our system using both simulated and real users. We expect that our approach can support flexible management of multi-domain dialogs on the same framework.

Computer Speech & Language | 2009

Data-driven user simulation for automated evaluation of spoken dialog systems

Sangkeun Jung; Cheongjae Lee; Kyungduk Kim; Minwoo Jeong; Gary Geunbae Lee

This paper proposes a novel integrated dialog simulation technique for evaluating spoken dialog systems. A data-driven user simulation technique for simulating user intention and utterance is introduced. A novel user intention modeling and generating method is proposed that uses a linear-chain conditional random field, and a two-phase data-driven domain-specific user utterance simulation method and a linguistic knowledge-based ASR channel simulation method are also presented. Evaluation metrics are introduced to measure the quality of user simulation at intention and utterance. Experiments using these techniques were carried out to evaluate the performance and behavior of dialog systems designed for car navigation dialogs and a building guide robot, and it turned out that our approach was easy to set up and showed similar tendencies to real human users.

Journal of computing science and engineering | 2010

Recent Approaches to Dialog Management for Spoken Dialog Systems

Cheongjae Lee; Sangkeun Jung; Kyungduk Kim; Donghyeon Lee; Gary Geunbae Lee

A field of spoken dialog systems is a rapidly growing research area because the performance improvement of speech technologies motivates the possibility of building systems that a human can easily operate in order to access useful information via spoken languages. Among the components in a spoken dialog system, the dialog management plays major roles such as discourse analysis, database access, error handling, and system action prediction. This survey covers design issues and recent approaches to the dialog management techniques for modeling the dialogs. We also explain the user simulation techniques for automatic evaluation of spoken dialog systems.

ReCALL | 2011

On the effectiveness of robot-assisted language learning

Sungjin Lee; Hyungjong Noh; Jonghoon Lee; Kyusong Lee; Gary Geunbae Lee; Seongdae Sagong; Munsang Kim

This study introduces the educational assistant robots that we developed for foreign language learning and explores the effectiveness of robot-assisted language learning (RALL) which is in its early stages. To achieve this purpose, a course was designed in which students have meaningful interactions with intelligent robots in an immersive environment. A total of 24 elementary students, ranging in age from ten to twelve, were enrolled in English lessons. A pre-test/post-test design was used to investigate the cognitive effects of the RALL approach on the students??? oral skills. No significant difference in the listening skill was found, but the speaking skills improved with a large effect size at the significance level of 0.01. Descriptive statistics and the pre-test/post-test design were used to investigate the affective effects of RALL approach. The result showed that RALL promoted and improved students??? satisfaction, interest, confidence, and motivation at the significance level of 0.01.

empirical methods in natural language processing | 2009

Semi-supervised Speech Act Recognition in Emails and Forums

Minwoo Jeong; Chin-Yew Lin; Gary Geunbae Lee

In this paper, we present a semi-supervised method for automatic speech act recognition in email and forums. The major challenge of this task is due to lack of labeled data in these two genres. Our method leverages labeled data in the Switchboard-DAMSL and the Meeting Recorder Dialog Act database and applies simple domain adaptation techniques over a large amount of unlabeled email and forum data to address this problem. Our method uses automatically extracted features such as phrases and dependency trees, called subtree features, for semi-supervised learning. Empirical results demonstrate that our model is effective in email and forum speech act recognition.

IEEE Transactions on Audio, Speech, and Language Processing | 2008

Triangular-Chain Conditional Random Fields

Minwoo Jeong; Gary Geunbae Lee

Sequential modeling is a fundamental task in scientific fields, especially in speech and natural language processing, where many problems of sequential data can be cast as a sequential labeling or a sequence classification. In many applications, the two problems are often correlated, for example named entity recognition and dialog act classification for spoken language understanding. This paper presents triangular-chain conditional random fields (CRFs), a unified probabilistic model combining two related problems. Triangular-chain CRFs jointly represent the sequence and meta-sequence labels in a single graphical structure that both explicitly encodes their dependencies and preserves uncertainty between them. An efficient inference and parameter estimation method is described for triangular-chain CRFs by extending linear-chain CRFs. This method outperforms baseline models on synthetic data and real-world dialog data for spoken language understanding.

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications | 2004

POSBIOTM-NER in the shared task of BioNLP/NLPBA 2004

Yu Song; Eunju Kim; Gary Geunbae Lee; Byoung-kee Yi

Two classifiers -- Support Vector Machine (SVM) and Conditional Random Fields (CRFs) are applied here for the recognition of biomedical named entities. According to their different characteristics, the results of two classifiers are merged to achieve better performance. We propose an automatic corpus expansion method for SVM and CRF to overcome the shortage of the annotated training data. In addition, we incorporate a keyword-based post-processing step to deal with the remaining problems such as assigning an appropriate named entity tag to the word/phrase containing parentheses.

north american chapter of the association for computational linguistics | 2006

MMR-based Active Machine Learning for Bio Named Entity Recognition

Seokhwan Kim; Yu Song; Kyungduk Kim; Jeong-Won Cha; Gary Geunbae Lee

This paper presents a new active learning paradigm which considers not only the uncertainty of the classifier but also the diversity of the corpus. The two measures for uncertainty and diversity were combined using the MMR (Maximal Marginal Relevance) method to give the sampling scores in our active learning strategy. We incorporated MMR-based active machine-learning idea into the biomedical named-entity recognition system. Our experimental results indicated that our strategies for active-learning based sample selection could significantly reduce the human effort.

meeting of the association for computational linguistics | 2003

Automatic Acquisition of Named Entity Tagged Corpus from World Wide Web

Joohui An; Seungwoo Lee; Gary Geunbae Lee

In this paper, we present a method that automatically constructs a Named Entity (NE) tagged corpus from the web to be used for learning of Named Entity Recognition systems. We use an NE list and an web search engine to collect web documents which contain the NE instances. The documents are refined through sentence separation and text refinement procedures and NE instances are finally tagged with the appropriate NE categories. Our experiments demonstrates that the suggested method can acquire enough NE tagged corpus equally useful to the manually tagged one without any human intervention.

Explore More