Taeho Jo
University of Ottawa
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Taeho Jo.
international symposium on neural networks | 2007
Taeho Jo; Malrey Lee
This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result of text clustering using K-means algorithm or Kohonen Network is able to be evaluated by setting the number of clusters as the number of the given target categories, mapping each cluster to a target category, and using the evaluation measures of text. But in using single pass algorithm, if the number of clusters is different from the number of target categories, such measures are useless for evaluating the result of text clustering. This study proposes an evaluation measure of text clustering based on intra-cluster similarity and inter-cluster similarity, what is called CI (Clustering Index) in this article.
international conference on hybrid information technology | 2006
Taeho Jo; Malrey Lee; Thomas M. Gatton
A document surrogate is usually represented in a list of words. Because not all words in a document reflect its content, it is necessary to select important words from the document that relate to its content. Such important words are called keywords and are selected with a particular equation based on Term Frequency (TF) and Inverted Document Frequency (IDF). Additionally, the position of each word in the document and the inclusion of the word in the title should be considered to select keywords among words contained in the text. The equation based on these factors gets too complicated to be applied to the selection of keywords. This paper proposes a neural network back propagation model in which these factors are used as the features and feature vectors are generated to select keywords. This paper will show that the proposed neural network backpropagation approach outperforms the equation in distinguishing keywords.
international conference on intelligent computing | 2006
Malrey Lee; Taeho Jo
This paper proposes an alternative version of SVM using string vectors as its input data. A string vector is defined as a finite ordered set of words. Almost all of machine learning algorithms including traditional versions of SVM use numerical vectors as their input data. In order to apply such machine learning algorithms to classification or clustering, any type of raw data should be encoded into numerical vectors. In text categorization or text clustering, representing documents into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. Although traditional versions of SVM are tolerable to huge dimensionality, they are not robust to sparse distribution in their training and classification. Therefore, in order to avoid this problem, this research proposes another version of SVM, which uses string vectors given as an alternative type of structured data to numerical vectors. For applying the proposed version of SVM to text categorization, documents are encoded into string vectors by defining conditions of words as their features and positioning words corresponding to the conditions into each string vector. A similarity matrix, word by word, which defines semantic similarities of words, should be built from a given training corpus, before using the proposed version of SVM for text categorization. Each semantic similarity given as an element of the matrix is computed based on collocations of two words within their same documents over a given corpus. In this paper, inner product of two string vectors as the proposed kernel function indicates an average semantic similarity over their elements. In order to validate the proposed version of SVM, it will be compared with a traditional version of SVM using linear kernel functions in text categorization on two test beds.
software engineering research and applications | 2007
Taeho Jo; Malrey Lee
This research proposes a new strategy where documents are encoded into string vectors for text categorization and modified versions of SVM to be adaptable to string vectors. Traditionally, when the supervised machine learning algorithms are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and apply the SVM to string vectors for text categorization.
international symposium on neural networks | 2007
Taeho Jo; Malrey Lee
This paper attempts to evaluate machine learning based approaches to text categorization including NTC without decomposing it into binary classification problems, and presents another learning scheme of NTC. In previous research on text categorization, state of the art approaches have been evaluated in text categorization, decomposing it into binary classification problems. With such decomposition, it becomes complicated and expensive to implement text categorization systems, using machine learning algorithms. Another learning scheme of NTC mentioned in this paper is unconditional learning where weights of words stored in its learning layer are updated whenever each training example is presented, while its previous learning scheme is mistake driven learning, where weights of words are updated only when a training example is misclassified. This research will find advantages and disadvantages of both learning schemes by comparing them with each other
Journal of Convergence Information Technology | 2007
Taeho Jo; Malrey Lee; Yigon Kim
international conference on internet computing | 2007
Taeho Jo; Malrey Lee; Eungyeong Kim
international conference on artificial intelligence | 2007
Taeho Jo; Malrey Lee
SWWS | 2007
Keun-Sang Yi; Taeho Jo; Malrey Lee; Young-Keun Choi
MLMTA | 2007
Taeho Jo; Malrey Lee; Thomas M. Gatton