Youngjoong Ko
Dong-a University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Youngjoong Ko.
Information Processing and Management | 2009
Youngjoong Ko; Jungyun Seo
Many machine learning algorithms have been applied to text classification tasks. In the machine learning paradigm, a general inductive process automatically builds a text classifier by learning, generally known as supervised learning. However, the supervised learning approaches have some problems. The most notable problem is that they require a large number of labeled training documents for accurate learning. While unlabeled documents are easily collected and plentiful, labeled documents are difficultly generated because a labeling task must be done by human developers. In this paper, we propose a new text classification method based on unsupervised or semi-supervised learning. The proposed method launches text classification tasks with only unlabeled documents and the title word of each category for learning, and then it automatically learns text classifier by using bootstrapping and feature projection techniques. The results of experiments showed that the proposed method achieved reasonably useful performance compared to a supervised method. If the proposed method is used in a text classification task, building text classification systems will become significantly faster and less expensive.
Pattern Recognition Letters | 2008
Youngjoong Ko; Jungyun Seo
This paper proposes an effective method to extract salient sentences using contextual information and statistical approaches for text summarization. The proposed method combines two consecutive sentences into a bi-gram pseudo sentence so that contextual information is applied to statistical sentence-extraction techniques. Salient bi-gram pseudo sentences are first selected by the statistical sentence-extraction techniques, and then each selected bi-gram pseudo sentence is separated into two single sentences. The second sentence-extraction task for the separated single sentences is performed to make a final text summary. Because the proposed method uses the contextual information with the bi-gram pseudo sentences and combines the statistical sentence-extraction techniques effectively, it can achieve high performance. As a result, the proposed method showed better performance than other sentence-extraction methods in both single- and multi-document summarization.
Information Processing Letters | 2008
Youngjoong Ko; Hongkuk An; Jungyun Seo
A (page or web) snippet is a document excerpt allowing a user to understand if a document is indeed relevant without accessing it. This paper proposes an effective snippet generation method. A statistical query expansion approach with pseudo-relevance feedback and text summarization techniques are applied to salient sentence extraction for good quality snippets. In the experimental results, the proposed method showed much better performance than other methods including those of commercial Web search engines such as Google and Naver.
meeting of the association for computational linguistics | 2009
Seon Yang; Youngjoong Ko
This paper proposes how to automatically identify Korean comparative sentences from text documents. This paper first investigates many comparative sentences referring to previous studies and then defines a set of comparative keywords from them. A sentence which contains one or more elements of the keyword set is called a comparative-sentence candidate. Finally, we use machine learning techniques to eliminate non-comparative sentences from the candidates. As a result, we achieved significant performance, an F1-score of 88.54%, in our experiments using various web documents.
asia information retrieval symposium | 2004
Wooncheol Jung; Youngjoong Ko; Jungyun Seo
Automatic text summarization sets the goal at reducing the size of a document while preserving its content. Our summarization system is based on Two-step Sentence Extraction. As it combines statistical methods and reduces noise data through two steps efficiently, it can achieve high performance. In our experiments for 30% compression and 10% compression, our method is compared with Title, Location, Aggregation Similarity, and DOCUSUM methods. As a result, our method showed higher performance than other methods.
international acm sigir conference on research and development in information retrieval | 2007
Youngjoong Ko; Hongkuk An; Jungyun Seo
A (page or web) snippet is document excerpts allowing a user to understand if a document is indeed relevant without accessing it. This paper proposes an effective snippet generation method. The pseudo relevance feedback technique and text summarization techniques are applied to salient sentences extraction for generating good quality snippets. In the experimental results, the proposed method showed much better performance than other methods including Google and Naver.
Information Processing and Management | 2007
Hyoungdong Han; Youngjoong Ko; Jungyun Seo
Automatic text classification is the problem of automatically assigning predefined categories to free text documents, thus allowing for less manual labors required by traditional classification methods. When we apply binary classification to multi-class classification for text classification, we usually use the one-against-the-rest method. In this method, if a document belongs to a particular category, the document is regarded as a positive example of that category; otherwise, the document is regarded as a negative example. Finally, each category has a positive data set and a negative data set. But, this one-against-the-rest method has a problem. That is, the documents of a negative data set are not labeled manually, while those of a positive set are labeled by human. Therefore, the negative data set probably includes a lot of noisy data. In this paper, we propose that the sliding window technique and the revised EM (Expectation Maximization) algorithm are applied to binary text classification for solving this problem. As a result, we can improve binary text classification through extracting potentially noisy documents from the negative data set using the sliding window technique and removing actually noisy documents using the revised EM algorithm. The results of our experiments showed that our method achieved better performance than the original one-against-the-rest method in all the data sets and all the classifiers used in the experiments.
Journal of the Association for Information Science and Technology | 2015
Sungho Kim; Youngjoong Ko; Douglas W. Oard
This article explores how best to use lexical and statistical translation evidence together for cross‐language information retrieval (CLIR). Lexical translation evidence is assembled from Wikipedia and from a large machine‐readable dictionary, statistical translation evidence is drawn from parallel corpora, and evidence from co‐occurrence in the document language provides a basis for limiting the adverse effect of translation ambiguity. Coverage statistics for NII Testbeds and Community for Information Access Research (NTCIR) queries confirm that these resources have complementary strengths. Experiments with translation evidence from a small parallel corpus indicate that even rather rough estimates of translation probabilities can yield further improvements over a strong technique for translation weighting based on using Jensen–Shannon divergence as a term‐association measure. Finally, a novel approach to posttranslation query expansion using a random walk over the Wikipedia concept link graph is shown to yield further improvements over alternative techniques for posttranslation query expansion. Evaluation results on the NTCIR‐5 English–Korean test collection show statistically significant improvements over strong baselines.
Pattern Recognition Letters | 2013
Sangwoo Kang; Youngjoong Ko; Jungyun Seo
The analysis of a speech act is important for dialogue understanding systems because the speech act of an utterance is closely associated with the users intention in the utterance. This paper proposes a speech act classification model that effectively uses a two-layer hierarchical structure generated from the adjacency pair information of speech acts. The proposed model has two advantages when adding hierarchical information to speech act classification; the improved accuracy of the speech act classification and the reduced running time in the testing phase. As a result, it achieves higher performance than other models that do not use the hierarchical structure and has faster running time because Support Vector Machine classifiers can efficiently be arranged on the two-layer hierarchical structure.
Pattern Recognition Letters | 2011
Seon Yang; Youngjoong Ko
In this paper, we study how to extract comparative sentences from Korean text documents. We decompose our task into three steps: (1) collecting comparative keywords; (2) extracting comparative-sentence candidates by keyword searching; and (3) eliminating non-comparative sentences from these candidates using machine learning techniques. We perform various experiments to find relevant features. As a result, our experiments show significant performance, an F1-score of 90.23%.