Maengsik Choi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Maengsik Choi is active.

Explore More

Publication

Featured researches published by Maengsik Choi.

Information Processing and Management | 2013

Social relation extraction from texts using a support-vector-machine-based dependency trigram kernel

Maengsik Choi; Harksoo Kim

We propose a social relation extraction system using dependency-kernel-based support vector machines (SVMs). The proposed system classifies input sentences containing two peoples names on the basis of whether they do or do not describe social relations between two people. The system then extracts relation names (i.e., social-related keywords) from sentences describing social relations. We propose new tree kernels called dependency trigram kernels for effectively implementing these processes using SVMs. Experiments showed that the proposed kernels delivered better performance than the existing dependency kernel. On the basis of the experimental evidence, we suggest that the proposed system can be used as a useful tool for automatically constructing social networks from unstructured texts.

The Kips Transactions:partb | 2011

Part-Of-Speech Tagging and the Recognition of the Korean Unknown-words Based on Machine Learning

Maengsik Choi; Harksoo Kim

Unknown morpheme errors in Korean morphological analysis are divided into two types: The one is the errors that a morphological analyzer entirely fails to return any morpheme sequences, and the other is the errors that a morphological analyzer returns incorrect combinations of known morphemes. Most previous unknown morpheme estimation techniques have been focused on only the former errors. This paper proposes a unknown morpheme estimation method which can handle both of the unknown morpheme errors. The proposed method detects Eojeols (Korean spacing units) that may include unknown morpheme errors using SVM (Support Vector Machine). Then, using CRFs (Conditional Random Fields), it segments morphemes from the detected Eojeols and annotates the segmented morphemes with new POS tags. In the experiments, the proposed method outperformed the conventional method based on the longest matching of functional words. Based on the experimental results, we knew that the second type errors should be dealt with in order to increase the performance of Korean morphological analysis.

international conference on big data and smart computing | 2016

Bagging-based active learning model for named entity recognition with distant supervision

Sunghee Lee; Yeongkil Song; Maengsik Choi; Harksoo Kim

Named entity recognition (NER) is a preliminary step to performing information extraction and question answering. Most previous studies on NER have been based on supervised machine learning methods that need a large amount of human-annotated training corpus. In this paper, we propose a semi-supervised NER model to minimize the time-consuming and labor-intensive task for constructing the training corpus. The proposed model generates weakly labeled training corpus using a distant supervision method. Then, it improves NER accuracy by refining the weakly labeled training corpus using a bagging-based active learning method. In the experiments, the proposed model outperformed the previous semi-supervised model. It showed F1-measure of 0.764 after 15 times of bagging-based active learning.

international acm sigir conference on research and development in information retrieval | 2012

Dependency trigram model for social relation extraction from news articles

Maengsik Choi; Harksoo Kim; Bruce Croft

We propose a kernel-based model to automatically extract social relations such as economic relations and political relations between two people from news articles. To determine whether two people are structurally associated with each other, the proposed model uses an SVM (support vector machine) tree kernel based on trigrams of head-dependent relations between them. In the experiments with the automatic content extraction (ACE) corpus and a Korean news corpus, the proposed model outperformed the previous systems based on SVM tree kernels even though it used more shallow linguistic knowledge.

KIPS Transactions on Software and Data Engineering | 2016

Coreference Resolution for Korean Using Random Forests

Seokwon Jeong; Maengsik Choi; Harksoo Kim

Coreference resolution is to identify mentions in documents and is to group co-referred mentions in the documents. It is an essential step for natural language processing applications such as information extraction, event tracking, and question-answering. Recently, various coreference resolution models based on ML (machine learning) have been proposed, As well-known, these ML-based models need large training data that are manually annotated with coreferred mention tags. Unfortunately, we cannot find usable open data for learning ML-based models in Korean. Therefore, we propose an efficient coreference resolution model that needs less training data than other ML-based models. The proposed model identifies co-referred mentions using random forests based on sieve-guided features. In the experiments with baseball news articles, the proposed model showed a better CoNLL F1-score of 0.6678 than other ML-based models.

Journal of KIISE | 2015

Construction of Korean Knowledge Base Based on Machine Learning from Wikipedia

Seokwon Jeong; Maengsik Choi; Harksoo Kim

The performance of many natural language processing applications depends on the knowledge base as a major resource. WordNet, YAGO, Cyc, and BabelNet have been extensively used as knowledge bases in English. In this paper, we propose a method to construct a YAGO-style knowledge base automatically for Korean (hereafter, K-YAGO) from Wikipedia and YAGO. The proposed system constructs an initial K-YAGO simply by matching YAGO to info-boxes in Wikipedia. Then, the initial K-YAGO is expanded through the use of a machine learning technique. Experiments with the initial K-YAGO shows that the proposed system has a precision of 0.9642. In the experiments with the expanded part of K-YAGO, an accuracy of 0.9468 was achieved with an average macro F1-measure of 0.7596.

Journal of KIISE | 2015

One-Class Classification Model Based on Lexical Information and Syntactic Patterns

Hyeon-gu Lee; Maengsik Choi; Harksoo Kim

Relation extraction is an important information extraction technique that can be widely used in areas such as question-answering and knowledge population. Previous studies on relation extraction have been based on supervised machine learning models that need a large amount of training data manually annotated with relation categories. Recently, to reduce the manual annotation efforts for constructing training data, distant supervision methods have been proposed. However, these methods suffer from a drawback: it is difficult to use these methods for collecting negative training data that are necessary for resolving classification problems. To overcome this drawback, we propose a one-class classification model that can be trained without using negative data. The proposed model determines whether an input data item is included in an inner category by using a similarity measure based on lexical information and syntactic patterns in a vector space. In the experiments conducted in this study, the proposed model showed higher performance (an F1-score of 0.6509 and an accuracy of 0.6833) than a representative one-class classification model, one-class SVM(Support Vector Machine).

MUSIC | 2014

Lexical Feature Extraction Method for Classification of Erroneous Online Customer Reviews Based on Pattern Matching

Maengsik Choi; Harksoo Kim

In morpheme-based languages such as Korean and Japanese, spacing and spelling errors that frequently occur in online documents make it difficult to reliably extract informative lexical clues for sentiment analysis. To overcome this problem, we propose a simple, reliable lexical feature extraction method for sentiment classification systems; this method targets online customer reviews in Korean, which include numerous spacing and spelling errors. The proposed method performs longest-matching between input sentences and two kinds of patterns (spacing-unit patterns and phoneme patterns) that are automatically constructed from a large POS tagged corpus. Thereafter, the method returns content words associated with the longest matched patterns. In the experiments on sentiment classification, the proposed method outperformed previous lexical feature extraction methods, which are based on conventional morphological analyzers.

The Journal of Supercomputing | 2016