John D. Holt | Researchain

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John D. Holt is active.

Explore More

Publication

Featured researches published by John D. Holt.

data and knowledge engineering | 2008

Text document clustering based on frequent word meaning sequences

Yanjun Li; Soon Myoung Chung; John D. Holt

Most of existing text clustering algorithms use the vector space model, which treats documents as bags of words. Thus, word sequences in the documents are ignored, while the meaning of natural languages strongly depends on them. In this paper, we propose two new text clustering algorithms, named Clustering based on Frequent Word Sequences (CFWS) and Clustering based on Frequent Word Meaning Sequences (CFWMS). A word is the word form showing in the document, and a word meaning is the concept expressed by synonymous word forms. A word (meaning) sequence is frequent if it occurs in more than certain percentage of the documents in the text database. The frequent word (meaning) sequences can provide compact and valuable information about those text documents. For experiments, we used the Reuters-21578 text collection, CISI documents of the Classic data set [Classic data set, ftp://ftp.cs.cornell.edu/pub/smart/], and a corpus of the Text Retrieval Conference (TREC) [High Accuracy Retrieval from Documents (HARD) Track of Text Retrieval Conference, 2004]. Our experimental results show that CFWS and CFWMS have much better clustering accuracy than Bisecting k-means (BKM) [M. Steinbach, G. Karypis, V. Kumar, A Comparison of Document Clustering Techniques, KDD-2000 Workshop on Text Mining, 2000], a modified bisecting k-means using background knowledge (BBK) [A. Hotho, S. Staab, G. Stumme, Ontologies improve text document clustering, in: Proceedings of the 3rd IEEE International Conference on Data Mining, 2003, pp. 541-544] and Frequent Itemset-based Hierarchical Clustering (FIHC) [B.C.M. Fung, K. Wang, M. Ester, Hierarchical document clustering using frequent itemsets, in: Proceedings of SIAM International Conference on Data Mining, 2003] algorithms.

Information Processing Letters | 2002

Mining association rules using inverted hashing and pruning

John D. Holt; Soon Myoung Chung

In this paper, we propose a new algorithm named Inverted Hashing and Pruning (IHP) for mining association rules between items in transaction databases. The performance of the IHP algorithm was evaluated for various cases and compared with those of two well-known mining algorithms, Apriori algorithm [Proc. 20th VLDB Conf., 1994, pp. 487-499] and Direct Hashing and Pruning algorithm [IEEE Trans. on Knowledge Data Engrg. 9 (5) (1997) 813-825]. It has been shown that the IHP algorithm has better performance for databases with long transactions.

Knowledge and Information Systems | 2001

Multipass algorithms for mining association rules in text databases

John D. Holt; Soon Myoung Chung

Abstract. In this paper, we propose two new algorithms for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets (i.e., words) that need to be counted. Two well-known mining algorithms, Apriori algorithm and Direct Hashing and Pruning (DHP) algorithm, are evaluated in the context of mining text databases, and are compared with the new proposed algorithms named Multipass-Apriori (M-Apriori) and Multipass-DHP (M-DHP). It has been shown that the proposed algorithms have better performance for large text databases.

conference on information and knowledge management | 1999

Efficient mining of association rules in text databases

John D. Holt; Soon Myoung Chung

In this paper, we propose two new algorithms for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets (i.e., words) that need to be counted. Two well-known mining algorithms, Apriori algorithm and Direct Hashing and Pruning (DHP) algorithm, are evaluated in the context of mining text databases, and are compared with the new proposed algorithms named Multipass-Apriori (M-Apriori) and Multipass-DHP (M-DHP). It has been shown that the proposed algorithms have better performance for large text databases.

international parallel and distributed processing symposium | 2004

Parallel mining of association rules from text databases on a cluster of workstations

John D. Holt; Soon Myoung Chung

Summary form only given. We propose a new algorithm named Parallel Multipass with Inverted Hashing and Pruning (PMIHP) for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets (i.e., sets of words) that need to be counted. The new PMIHP algorithm is a parallel version of our multipass with inverted hashing and pruning (MIHP) algorithm, which was shown to be quite efficient than other existing algorithms in the context of mining text databases. The PMIHP algorithm reduces the overhead of communication between miners running on different processors because they are mining local databases asynchronously and prune the global candidates by using the inverted hashing and pruning technique.

international conference on tools with artificial intelligence | 2002

Mining association rules in text databases using multipass with inverted hashing and pruning

John D. Holt; Soon Myoung Chung

In this paper, we propose a new algorithm named multipass with inverted hashing and pruning (MIHP) for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets (i.e., words) that need to be counted. Two well-known mining algorithms, the apriori algorithm and the direct hashing and pruning (DHP) algorithm, are evaluated in the context of mining text databases, and are compared with the proposed MIHP algorithm. It has been shown that the MIHP algorithm performs better for large text databases.

international conference on tools with artificial intelligence | 2007

Usage of Mined Word Associations for Text Retrieval

John D. Holt; Soon Myoung Chung; Yanjun Li

In this paper, we evaluated the efficacy of mined association rules between words for measuring the similarity between documents to enhance the text retrieval. In our experiments, for each document relevant to a query, we formed a group of documents having at least one common frequent set of words with the answer document. Then we measured the precision of the documents in the same group as an answer set to the corresponding query. This experiment was performed using a corpus of the Text retrieval conference (TREC) and search results. Our experimental results show that the frequent sets of words mined from our test database are useful in ranking query result sets to improve the precision of retrieval.

data warehousing and knowledge discovery | 2000

Mining of Association Rules in Text Databases Using Inverted Hashing and Pruning

John D. Holt; Soon Myoung Chung

In this paper, we propose a new algorithm named Inverted Hashing and Pruning (IHP) for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently, because of the large number of itemsets (i.e., words) that need to be counted. Two well-known mining algorithms, the Apriori algorithm [1] and Direct Hashing and Pruning (DHP) algorithm [5], are evaluated in the context of mining text databases, and are compared with the proposed IHP algorithm. It has been shown that the IHP algorithm has better performance for large text databases.

The Journal of Supercomputing | 2007

Parallel mining of association rules from text databases

John D. Holt; Soon Myoung Chung

Archive | 2003

Efficient sequential and parallel algorithms for mining association rules in text databases

Soon Myoung Chung; John D. Holt

Explore More

Collaboration

Dive into the John D. Holt's collaboration.

Top Co-Authors

Soon Myoung Chung

Wright State University

View shared research outputs

Top Co-Authors

Yanjun Li

Fordham University

View shared research outputs

Explore More

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where John D. Holt is active.

Publication

Featured researches published by John D. Holt.

Text document clustering based on frequent word meaning sequences

Mining association rules using inverted hashing and pruning

Multipass algorithms for mining association rules in text databases

Efficient mining of association rules in text databases

Parallel mining of association rules from text databases on a cluster of workstations

Mining association rules in text databases using multipass with inverted hashing and pruning

Usage of Mined Word Associations for Text Retrieval

Mining of Association Rules in Text Databases Using Inverted Hashing and Pruning

Parallel mining of association rules from text databases

Efficient sequential and parallel algorithms for mining association rules in text databases

Collaboration

Dive into the John D. Holt's collaboration.