Taeho Jo

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Taeho Jo is active.

Explore More

Publication

Featured researches published by Taeho Jo.

international symposium on neural networks | 2007

The Evaluation Measure of Text Clustering for the Variable Number of Clusters

Taeho Jo; Malrey Lee

This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result of text clustering using K-means algorithm or Kohonen Network is able to be evaluated by setting the number of clusters as the number of the given target categories, mapping each cluster to a target category, and using the evaluation measures of text. But in using single pass algorithm, if the number of clusters is different from the number of target categories, such measures are useless for evaluating the result of text clustering. This study proposes an evaluation measure of text clustering based on intra-cluster similarity and inter-cluster similarity, what is called CI (Clustering Index) in this article.

international conference on hybrid information technology | 2006

Keyword Extraction from Documents Using a Neural Network Model

Taeho Jo; Malrey Lee; Thomas M. Gatton

A document surrogate is usually represented in a list of words. Because not all words in a document reflect its content, it is necessary to select important words from the document that relate to its content. Such important words are called keywords and are selected with a particular equation based on Term Frequency (TF) and Inverted Document Frequency (IDF). Additionally, the position of each word in the document and the inclusion of the word in the title should be considered to select keywords among words contained in the text. The equation based on these factors gets too complicated to be applied to the selection of keywords. This paper proposes a neural network back propagation model in which these factors are used as the features and feature vectors are generated to select keywords. This paper will show that the proposed neural network backpropagation approach outperforms the equation in distinguishing keywords.

international conference on intelligent computing | 2006

Support vector machine for string vectors

Malrey Lee; Taeho Jo

This paper proposes an alternative version of SVM using string vectors as its input data. A string vector is defined as a finite ordered set of words. Almost all of machine learning algorithms including traditional versions of SVM use numerical vectors as their input data. In order to apply such machine learning algorithms to classification or clustering, any type of raw data should be encoded into numerical vectors. In text categorization or text clustering, representing documents into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. Although traditional versions of SVM are tolerable to huge dimensionality, they are not robust to sparse distribution in their training and classification. Therefore, in order to avoid this problem, this research proposes another version of SVM, which uses string vectors given as an alternative type of structured data to numerical vectors. For applying the proposed version of SVM to text categorization, documents are encoded into string vectors by defining conditions of words as their features and positioning words corresponding to the conditions into each string vector. A similarity matrix, word by word, which defines semantic similarities of words, should be built from a given training corpus, before using the proposed version of SVM for text categorization. Each semantic similarity given as an element of the matrix is computed based on collocations of two words within their same documents over a given corpus. In this paper, inner product of two string vectors as the proposed kernel function indicates an average semantic similarity over their elements. In order to validate the proposed version of SVM, it will be compared with a traditional version of SVM using linear kernel functions in text categorization on two test beds.

software engineering research and applications | 2007

Kernel based Learning Suitable for Text Categorization

Taeho Jo; Malrey Lee

This research proposes a new strategy where documents are encoded into string vectors for text categorization and modified versions of SVM to be adaptable to string vectors. Traditionally, when the supervised machine learning algorithms are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and apply the SVM to string vectors for text categorization.

international symposium on neural networks | 2007

Mistaken Driven and Unconditional Learning of NTC

Taeho Jo; Malrey Lee

This paper attempts to evaluate machine learning based approaches to text categorization including NTC without decomposing it into binary classification problems, and presents another learning scheme of NTC. In previous research on text categorization, state of the art approaches have been evaluated in text categorization, decomposing it into binary classification problems. With such decomposition, it becomes complicated and expensive to implement text categorization systems, using machine learning algorithms. Another learning scheme of NTC mentioned in this paper is unconditional learning where weights of words stored in its learning layer are updated whenever each training example is presented, while its previous learning scheme is mistake driven learning, where weights of words are updated only when a training example is misclassified. This research will find advantages and disadvantages of both learning schemes by comparing them with each other

Journal of Convergence Information Technology | 2007

String Vectors as a Representation of Documents with Numerical Vectors in Text Categorization.

Taeho Jo; Malrey Lee; Yigon Kim

international conference on internet computing | 2007

Using Inverted Index based Operation for Modifying Example based Learning Algorithm.

Taeho Jo; Malrey Lee; Eungyeong Kim

international conference on artificial intelligence | 2007

SM based Operation for Specializing a Fast Clustering Algorithm for Text Clustering.

Taeho Jo; Malrey Lee

SWWS | 2007

Modifying Online Text Clustering Algorithm using Inverted Index based Operation.

Keun-Sang Yi; Taeho Jo; Malrey Lee; Young-Keun Choi

MLMTA | 2007

Inverted Index based Operation on String Vector for K Means Algorithm in Text Clustering.

Taeho Jo; Malrey Lee; Thomas M. Gatton

Explore More

Collaboration

Dive into the Taeho Jo's collaboration.

Top Co-Authors

Malrey Lee

Chonbuk National University

View shared research outputs

Top Co-Authors

Thomas M. Gatton

Chonbuk National University

View shared research outputs

Top Co-Authors

Eungyeong Kim

Chonbuk National University

View shared research outputs

Top Co-Authors

Hyogun Yoon

Chonbuk National University

View shared research outputs

Explore More

University of Ottawa

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where Taeho Jo is active.

Publication

Featured researches published by Taeho Jo.

The Evaluation Measure of Text Clustering for the Variable Number of Clusters

Keyword Extraction from Documents Using a Neural Network Model

Support vector machine for string vectors

Kernel based Learning Suitable for Text Categorization

Mistaken Driven and Unconditional Learning of NTC

String Vectors as a Representation of Documents with Numerical Vectors in Text Categorization.

Using Inverted Index based Operation for Modifying Example based Learning Algorithm.

SM based Operation for Specializing a Fast Clustering Algorithm for Text Clustering.

Modifying Online Text Clustering Algorithm using Inverted Index based Operation.

Inverted Index based Operation on String Vector for K Means Algorithm in Text Clustering.

Collaboration

Dive into the Taeho Jo's collaboration.