Jungyun Seo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jungyun Seo is active.

Explore More

Publication

Featured researches published by Jungyun Seo.

Information Processing and Management | 2004

Improving text categorization using the importance of sentences

Youngjoong Ko; Jinwoo Park; Jungyun Seo

Automatic text categorization is a problem of assigning text documents to pre-defined categories. In order to classify text documents, we must extract useful features. In previous researches, a text document is commonly represented by the term frequency and the inverted document frequency of each feature. Since there is a difference between important sentences and unimportant sentences in a document, the features from more important sentences should be considered more than other features. In this paper, we measure the importance of sentences using text summarization techniques. Then we represent a document as a vector of features with different weights according to the importance of each sentence. To verify our new method, we conduct experiments using two language newsgroup data sets: one written by English and the other written by Korean. Four kinds of classifiers are used in our experiments: Naive Bayes, Rocchio, k-NN, and SVM. We observe that our new method makes a significant improvement in all these classifiers and both data sets.

international conference on computational linguistics | 2000

Automatic text categorization by unsupervised learning

Youngjoong Ko; Jungyun Seo

The goal of text categorization is to classify documents into a certain number of predefined categories. The previous works in this area have used a large number of labeled training documents for supervised learning. One problem is that it is difficult to create the labeled training documents. While it is easy to collect the unlabeled documents, it is not so easy to manually categorize them for creating training documents. In this paper, we propose an unsupervised learning method to overcome these difficulties. The proposed method divides the documents into sentences, and categorizes each sentence using keyword lists of each category and sentence similarity measure. And then, it uses the categorized sentences for training. The proposed method shows a similar degree of performance, compared with the traditional supervised learning methods. Therefore, this method can be used in areas where low-cost text categorization is needed. It also can be used for creating training documents.

Information Processing and Management | 2009

Text classification from unlabeled documents with bootstrapping and feature projection techniques

Youngjoong Ko; Jungyun Seo

Many machine learning algorithms have been applied to text classification tasks. In the machine learning paradigm, a general inductive process automatically builds a text classifier by learning, generally known as supervised learning. However, the supervised learning approaches have some problems. The most notable problem is that they require a large number of labeled training documents for accurate learning. While unlabeled documents are easily collected and plentiful, labeled documents are difficultly generated because a labeling task must be done by human developers. In this paper, we propose a new text classification method based on unsupervised or semi-supervised learning. The proposed method launches text classification tasks with only unlabeled documents and the title word of each category for learning, and then it automatically learns text classifier by using bootstrapping and feature projection techniques. The results of experiments showed that the proposed method achieved reasonably useful performance compared to a supervised method. If the proposed method is used in a text classification task, building text classification systems will become significantly faster and less expensive.

Pattern Recognition Letters | 2008

An effective sentence-extraction technique using contextual information and statistical approaches for text summarization

Youngjoong Ko; Jungyun Seo

This paper proposes an effective method to extract salient sentences using contextual information and statistical approaches for text summarization. The proposed method combines two consecutive sentences into a bi-gram pseudo sentence so that contextual information is applied to statistical sentence-extraction techniques. Salient bi-gram pseudo sentences are first selected by the statistical sentence-extraction techniques, and then each selected bi-gram pseudo sentence is separated into two single sentences. The second sentence-extraction task for the separated single sentences is performed to make a final text summary. Because the proposed method uses the contextual information with the bi-gram pseudo sentences and combines the statistical sentence-extraction techniques effectively, it can achieve high performance. As a result, the proposed method showed better performance than other sentence-extraction methods in both single- and multi-document summarization.

IEEE Intelligent Systems | 2008

Cluster-Based FAQ Retrieval Using Latent Term Weights

Harksoo Kim; Jungyun Seo

A high-performance FAQ retrieval system uses query-log clustering to resolve lexical-disagreement problems. The proposed system outperforms traditional information-retrieval systems in FAQ retrieval.

Information & Software Technology | 2000

Implementation of an efficient requirements-analysis supporting system using similarity measure techniques

Sooyong Park; Harksoo Kim; Youngjoong Ko; Jungyun Seo

Abstract As software becomes more complicated and larger, the software engineers requirements-analysis becomes an important and uneasy activity. This paper proposes a requirements-analysis supporting system that supports informal requirements-analysis. The proposed system measures the similarity between requirement sentences to identify possible redundancies and inconsistencies, and extracts the possible ambiguous requirements. The similarity measurement method combines a sliding window model and a parser model. Using these methods, the proposed system supports to trace dependency between documents and improve quality of requirement sentences. Efficiency of the proposed system and a process for requirement specification analysis using the system are presented.

meeting of the association for computational linguistics | 2004

Learning with Unlabeled Data for Text Categorization Using a Bootstrapping and a Feature Projection Technique

Youngjoong Ko; Jungyun Seo

A wide range of supervised learning algorithms has been applied to Text Categorization. However, the supervised learning approaches have some problems. One of them is that they require a large, often prohibitive, number of labeled training documents for accurate learning. Generally, acquiring class labels for training data is costly, while gathering a large quantity of unlabeled data is cheap. We here propose a new automatic text categorization method for learning from only unlabeled data using a bootstrapping framework and a feature projection technique. From results of our experiments, our method showed reasonably comparable performance compared with a supervised method. If our method is used in a text categorization task, building text categorization systems will become significantly faster and less expensive.

meeting of the association for computational linguistics | 2001

MAYA: a fast Question-answering system based on a predictive answer indexer

Harksoo Kim; Kyungsun Kim; Gary Geunbae Lee; Jungyun Seo

We propose a Question-answering (QA) system in Korean that uses a predictive answer indexer. The predictive answer indexer, first, extracts all answer candidates in a document in indexing time. Then, it gives scores to the adjacent content words that are closely related with each answer candidate. Next, it stores the weighted content words with each candidate into a database. Using this technique, along with a complementary analysis of questions, the proposed QA system can save response time because it is not necessary for the QA system to extract answer candidates with scores on retrieval time. If the QA system is combined with a traditional Information Retrieval system, it can improve the document retrieval precision for closed-class questions after minimum loss of retrieval time.

Information Processing and Management | 2004

Using the feature projection technique based on a normalized voting method for text classification

Youngjoong Ko; Jungyun Seo

This paper proposes a new approach for text categorization, based on a feature projection technique. In our approach, training data are represented as the projections of training documents on each feature. The voting for a classification is processed on the basis of individual feature projections. The final classification of test documents is determined by a majority voting from the individual classifications of each feature. Our empirical results show that the proposed approach, text categorization using feature projections (TCFP), outperforms k-NN, Rocchio, and Naive Bayes. Most of all, TCFP is a faster classifier, up to one hundred times faster than k-NN in the Newsgroups data set. It is also robust from noisy data. Since the TCFP algorithm is very simple, its implementation and training process can be done very easily. For these reasons, TCFP can be a useful classifier in text categorization tasks, which need fast execution speed, robustness, and high performance.

international conference on computational linguistics | 2002

Automatic text categorization using the importance of sentences

Youngjoong Ko; Jinwoo Park; Jungyun Seo

Automatic text categorization is a problem of automatically assigning text documents to predefined categories. In order to classify text documents, we must extract good features from them. In previous research, a text document is commonly represented by the term frequency and the inverted document frequency of each feature. Since there is a difference between important sentences and unimportant sentences in a document, the features from more important sentences should be considered more than other features. In this paper, we measure the importance of sentences using text summarization techniques. Then a document is represented as a vector of features with different weights according to the importance of each sentence. To verify our new method, we conducted experiments on two language newsgroup data sets: one written by English and the other written by Korean. Four kinds of classifiers were used in our experiments: Naive Bayes, Rocchio, k-NN, and SVM. We observed that our new method made a significant improvement in all classifiers and both data sets.

Explore More