Thomas E. Potok | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas E. Potok is active.

Explore More

Publication

Featured researches published by Thomas E. Potok.

ieee swarm intelligence symposium | 2005

Document clustering using particle swarm optimization

Xiaohui Cui; Thomas E. Potok; Paul J. Palathingal

Fast and high-quality document clustering algorithms play an important role in effectively navigating, summarizing, and organizing information. Recent studies have shown that partitional clustering algorithms are more suitable for clustering large datasets. However, the K-means algorithm, the most commonly used partitional clustering algorithm, can only generate a local optimal solution. In this paper, we present a particle swarm optimization (PSO) document clustering algorithm. Contrary to the localized searching of the K-means algorithm, the PSO clustering algorithm performs a globalized search in the entire solution space. In the experiments we conducted, we applied the PSO, K-means and hybrid PSO clustering algorithm on four different text document datasets. The number of documents in the datasets ranges from 204 to over 800, and the number of terms ranges from over 5000 to over 7000. The results illustrate that the hybrid PSO algorithm can generate more compact clustering results than the K-means algorithm.

international conference on machine learning and applications | 2006

TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams

Joel W. Reed; Yu Jiao; Thomas E. Potok; Brian A. Klump; Mark T. Elmore; Ali R. Hurson

In this paper, we propose a new term weighting scheme called term frequency-inverse corpus frequency (TF-ICF). It does not require term frequency information from other documents within the document collection and thus, it enables us to generate the document vectors of N streaming documents in linear time. In the context of a machine learning application, unsupervised document clustering, we evaluated the effectiveness of the proposed approach in comparison to five widely used term weighting schemes through extensive experimentation. Our results show that TF-ICF can produce document clusters that are of comparable quality as those generated by the widely recognized term weighting schemes and it is significantly faster than those methods

Journal of Systems Architecture | 2006

A flocking based algorithm for document clustering analysis

Xiaohui Cui; Jinzhu Gao; Thomas E. Potok

Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.

Photogrammetric Engineering and Remote Sensing | 2006

Automated Feature Generation in Large-Scale Geospatial Libraries for Content-Based Indexing.

Kenneth W. Tobin; Budhendra L. Bhaduri; Eddie A Bright; Anil Cheriydat; Thomas P. Karnowski; Paul J. Palathingal; Thomas E. Potok; Jeffery R. Price

We describe a method for indexing and retrieving high-resolution image regions in large geospatial data libraries. An automated feature extraction method is used that generates a unique and specific structural description of each segment of a tessellated input image file. These tessellated regions are then merged into similar groups, or sub-regions, and indexed to provide flexible and varied retrieval in a query-by-example environment. The methods of tessellation, feature extraction, sub-region clustering, indexing, and retrieval are described and demonstrated using a geospatial library representing a 153 km2 region of land in East Tennessee at 0.5 m per pixel resolution.

software engineering artificial intelligence networking and parallel distributed computing | 2005

Tracking non-stationary optimal solution by particle swarm optimizer

Xiaohui Cui; C. T. Hardin; Rammohan K. Ragade; Thomas E. Potok; A. S. Elmaghraby

In the real world, we have to frequently deal with searching for and tracking an optimal solution in a dynamic environment. This demands that the algorithm not only find the optimal solution but also track the trajectory of the solution in a dynamic environment. Particle swarm optimization (PSO) is a population-based stochastic optimization technique, which can find an optimal, or near optimal, solution to a numerical and qualitative problem. However, the traditional PSO algorithm lacks the ability to track the optimal solution in a dynamic environment. In this paper, we present a modified PSO algorithm that can be used for tracking a non-stationary optimal solution in a dynamically changing environment.

web intelligence | 2007

Boosting-Based Distributed and Adaptive Security-Monitoring through Agent Collaboration

Evens Jean; Yu Jiao; Ali R. Hurson; Thomas E. Potok

The use of mobile agents to support the development of practical applications is limited primarily by the risks to which hosts in the system are subject to. This article introduces a distributed and adaptive security-monitoring framework to decrease such potential threats. The proposed framework is based on a modified version of the popular Boosting algorithm to classify malicious agents based on their execution patterns on current and prior hosts. Having implemented the framework for the Aglet platform, we herein present the results of our experiments showcasing the detection of agent entities in the system with intention deviating from that of their well-behaved counterparts.

hawaii international conference on system sciences | 2002

An ontology-based HTML to XML conversion using intelligent agents

Thomas E. Potok; Mark T. Elmore; Joel W. Reed; Nagiza F. Samatova

How to organize and classify large amounts of heterogeneous information accessible over the Internet is a major problem faced by industry, government, and military organizations. XML is clearly a potential solution to this problem, however, a significant challenge is how to automatically convert information currently expressed in a standard HTML format to an XML format. Within the Virtual Information Processing Agent Research (VIPAR) project, we have developed a process using Internet ontologies and intelligent software agents to perform automatic HTML to XML conversion for Internet newspapers. The VIPAR software is based on a number of significant research breakthroughs. Most notably, the ability for intelligent agents to use a flexible RDF ontology to transform HTML documents to XML tagged documents. The VIPAR system is currently deployed at the USA Pacific Command, Camp Smith, HI, traversing up to 17 Internet newspapers daily.

genetic and evolutionary computation conference | 2009

Parallel latent semantic analysis using a graphics processing unit

Joseph M. Cavanagh; Thomas E. Potok; Xiaohui Cui

Latent Semantic Analysis (LSA) can be used to reduce the dimensions of large Term-Document datasets using Singular Value Decomposition. However, with the ever expanding size of data sets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. The Graphics Processing Unit (GPU) can solve some highly parallel problems much faster than the traditional sequential processor (CPU). Thus, a deployable system using a GPU to speedup large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a computer cluster. In this paper, we presented a parallel LSA implementation on the GPU, using NVIDIA R Compute Unified Device Architecture (CUDA) and Compute Unified Basic Linear Algebra Subprograms (CUBLAS). The performance of this implementation is compared to traditional LSA implementation on CPU using an optimized Basic Linear Algebra Subprograms library. For large matrices that have dimensions divisible by 16, the GPU algorithm ran five to six times faster than the CPU version.

Undergraduate Research Journal | 2008

Flocking-based Document Clustering on the Graphics Processing Unit

J S Charles; Robert M. Patton; Thomas E. Potok; Xiaohui Cui

Analyzing and grouping documents by content is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. Each bird represents a single document and flies toward other documents that are similar to it. One limitation of this method of document clustering is its complexity O(n 2). As the number of documents grows, it becomes increasingly difficult to receive results in a reasonable amount of time. However, flocking behavior, along with many naturally inspired algorithms such as ant colony optimization and particle swarm optimization, are highly parallel and have found increased performance on expensive cluster computers. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highlyparallel and semi-parallel problems much faster than the traditional sequential processor. Some applications see a huge increase in performance on this new platform. The cost of these high-performance devices is also marginal when compared with the price of cluster machines. In this paper, we have conducted research to exploit this architecture and apply its strengths to the document flocking problem. Our results highlight the potential benefit the GPU brings to many naturally inspired algorithms. Using the CUDA platform from NIVIDA®, we developed a document flocking implementation to be run on the NIVIDA® GEFORCE 8800. Additionally, we developed a similar but sequential implementation of the same algorithm to be run on a desktop CPU. We tested the performance of each on groups of news articles ranging in size from 200 to 3000 documents. The results of these tests were very significant. Performance gains ranged from three to nearly five times improvement of the GPU over the CPU implementation. Our results also confirm that each implementation is of similar complexity, confirming that gains are from the hardware and not from algorithmic benefits. This improvement in runtime makes the GPU a potentially powerful new platform for document analysis.

cooperative information agents | 2006

A distributed agent implementation of multiple species flocking model for document partitioning clustering

Xiaohui Cui; Thomas E. Potok

The Flocking model, first proposed by Craig Reynolds, is one of the first bio-inspired computational collective behavior models that has many popular applications, such as animation. Our early research has resulted in a flock clustering algorithm that can achieve better performance than the K-means or the Ant clustering algorithms for data clustering. This algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for efficient clustering result retrieval and visualization. In this paper, we propose a bio-inspired clustering model, the Multiple Species Flocking clustering model (MSF), and present a distributed multi-agent MSF approach for document clustering.

Explore More