William M. Pottenger
Rutgers University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by William M. Pottenger.
Information Processing and Management | 2006
April Kontostathis; William M. Pottenger
In this paper we present a theoretical model for understanding the performance of Latent Semantic Indexing (LSI) search and retrieval application. Many models for understanding LSI have been proposed. Ours is the first to study the values produced by LSI in the term by dimension vectors. The framework presented here is based on term co-occurrence data. We show a strong correlation between second-order term co-occurrence and the values produced by the Singular Value Decomposition (SVD) algorithm that forms the foundation for LSI. We also present a mathematical proof that the SVD algorithm encapsulates term co-occurrence information.
Archive | 2004
April Kontostathis; Leon M. Galitsky; William M. Pottenger; Soma Roy; Daniel J. Phelps
In this chapter we describe several systems that detect emerging trends in textual data. Some of the systems are semiautomatic, requiring user input to begin processing, and others are fully automatic, producing output from the input corpus without guidance. For each Emerging Trend Detection (ETD) system we describe components including linguistic and statistical features, learning algorithms, training and test set generation, visualization, and evaluation. We also provide a brief overview of several commercial products with capabilities of detecting trends in textual data, followed by an industrial viewpoint describing the importance of trend detection tools, and an overview of how such tools are used.
languages and compilers for parallel computing | 1994
William Blume; Rudolf Eigenmann; Keith A. Faigin; John R. Grout; Jay Hoeflinger; David A. Padua; Paul M. Petersen; William M. Pottenger; Lawrence Rauchwerger; Peng Tu; Stephen A. Weatherford
It is the goal of the Polaris project to develop a new parallelizing compiler that will overcome limitations of current compilers. While current parallelizing compilers may succeed on small kernels, they often fail to extract any meaningful parallelism from large applications. After a study of application codes, it was concluded that by adding a few new techniques to current compilers, automatic parallelization becomes possible. The techniques needed are interprocedural analysis, scalar and array privatization, symbolic dependence analysis, and advanced induction and reduction recognition and elimination, along with run-time techniques to allow data dependent behavior.
international symposium on circuits and systems | 2005
Faisal M. Khan; Mark G. Arnold; William M. Pottenger
Support vector machines are emerging as a powerful machine-learning tool. Logarithmic number systems (LNS) utilize the property of logarithmic compression for numerical operations. We present an implementation of a digital support vector machine (SVM) classifier using LNS in which, when compared with other implementations, considerable hardware savings are achieved with no significant loss in classification accuracy.
IEEE Transactions on Knowledge and Data Engineering | 2011
Murat Can Ganiz; Cibin George; William M. Pottenger
The underlying assumption in traditional machine learning algorithms is that instances are Independent and Identically Distributed (IID). These critical independence assumptions made in traditional machine learning algorithms prevent them from going beyond instance boundaries to exploit latent relations between features. In this paper, we develop a general approach to supervised learning by leveraging higher order dependencies between features. We introduce a novel Bayesian framework for classification termed Higher Order Naïve Bayes (HONB). Unlike approaches that assume data instances are independent, HONB leverages higher order relations between features across different instances. The approach is validated in the classification domain on widely used benchmark data sets. Results obtained on several benchmark text corpora demonstrate that higher order approaches achieve significant improvements in classification accuracy over the baseline methods, especially when training data is scarce. A complexity analysis also reveals that the space and time complexity of HONB compare favorably with existing approaches.
acm international conference on digital libraries | 1998
Yi-Ming Chung; William M. Pottenger; Bruce R. Schatz
The global growth in popularity of the World Wide Web has been enabled in part by the availability of browser based search tools which in turn have led to an increased demand for indexing techniques and technologies. As the amount of globally accessible information in community repositories grows, it is no longer cost-effective for such repositories to be indexed by professional indexers who have been trained to be consistent in subject assignment from controlled vocabulary lists. The era of amateur indexers is thus upon us, and the information infrastructure needs to provide support for such indexing if search of the Net is to produce useful results. In this paper, we propose the {\em Concept Assigner}, an automatic subject indexing system based on a variant of the Hopfield network \cite{Hopfield82}. In the application discussed herein, a collection of documents is used to automatically create a subset of a thesaurus termed a {\em Concept Space} \cite{ChenSchatz97}. To automatically index an individual document, concepts extracted from the given document become the input pattern to a Concept Space represented as a Hopfield network. The Hopfield net parallel spreading activation process produces another set of concepts that are strongly related to the concepts of the input document. Such concepts are suitable for use in an interactive indexing environment. A prototype of our automatic subject indexing system has been implemented as part of the {\em Interspace}, a semantic indexing and retrieval environment which supports statistically-based semantic indexing in a persistent object environment.
Journal of the Association for Information Science and Technology | 2005
Tianhao Wu; William M. Pottenger
In this article we present a semi-supervised active learning algorithm for pattern discovery in information extraction from textual data. The patterns are reduced regular expressions composed of various characteristics of features useful in information extraction. Our major contribution is a semi-supervised learning algorithm that extracts information from a set of examples labeled as relevant or irrelevant to a given attribute. The approach is semi-supervised because it does not require precise labeling of the exact location of features in the training data. This significantly reduces the effort needed to develop a training set. An active learning algorithm is used to assist the semi-supervised learning algorithm to further reduce the training set development effort. The active learning algorithm is seeded with a single positive example of a given attribute. The context of the seed is used to automatically identify candidates for additional positive examples of the given attribute. Candidate examples are manually pruned during the active learning phase, and our semi-supervised learning algorithm automatically discovers reduced regular expressions for each attribute. We have successfully applied this learning technique in the extraction of textual features from police incident reports, university crime reports, and patents. The performance of our algorithm compares favorably with competitive extraction systems being used in criminal justice information systems.
european conference on machine learning | 2009
Murat Can Ganiz; Nikita I. Lytkin; William M. Pottenger
Traditional machine learning methods only consider relationships between feature values within individual data instances while disregarding the dependencies that link features across instances. In this work, we develop a general approach to supervised learning by leveraging higher-order dependencies between features. We introduce a novel Bayesian framework for classification named Higher Order Naive Bayes (HONB). Unlike approaches that assume data instances are independent, HONB leverages co-occurrence relations between feature values across different instances. Additionally, we generalize our framework by developing a novel data-driven space transformation that allows any classifier operating in vector spaces to take advantage of these higher-order co-occurrence relations. Results obtained on several benchmark text corpora demonstrate that higher-order approaches achieve significant improvements in classification accuracy over the baseline (first-order) methods.
Sigkdd Explorations | 2005
Shenzhi Li; Tianhao Wu; William M. Pottenger
The burgconing amount of textual data in distributed sources combined with the obstacles involved in creating and maintaining central repositories motivates the need for effective distributed information extraction and mining techniques. Recently, as the need to mine patterns across distributed databases has grown, Distributed Association Rule Mining (D-ARM) algorithms have been developed. These algorithms, however, assume that the databases are either horizontally or vertically distributed. In the special case of databases populated from information extracted from textual data, existing D-ARM algorithms cannot discover rules based on higher-order associations between items in distributed textual documents that are neither vertically nor horizontally distributed, but rather a hybrid of the two. In this article we present D-HOTM, a framework for Distributed Higher Order Text Mining. D-HOTM is a hybrid approach that combines information extraction and distributed data mining. We employ a novel information extraction technique to extract meaningful entities from unstructured text in a distributed environment. The information extracted is stored in local databases and a mapping function is applied to identify globally unique keys. Based on the extracted information, a novel distributed association rule mining algorithm is applied to discover higher-order associations between items (i.e., entities) in records fragmented across the distributed databases using the keys. Unlike existing algorithms, D-HOTM requires neither knowledge of a global schema nor that the distribution of data be horizontal or vertical. Evaluation methods are proposed to incorporate the performance of the mapping function into the traditional support metric used in ARM evaluation. An example application of the algorithm on distributed law enforcement data demonstrates the relevance of D-HOTM in the fight against terrorism.
Information Visualization | 2009
William Ribarsky; Brian D. Fisher; William M. Pottenger
There has been progress in the science of analytical reasoning and in meeting the recommendations for future research that were laid out when the field of visual analytics was established. Researchers have also developed a group of visual analytics tools and methods that embody visual analytics principles and attack important and challenging real-world problems. However, these efforts are only the beginning and much study remains to be done. This article examines the state of the art in visual analytics methods and reasoning and gives examples of current tools and capabilities. It shows that the science of visual analytics needs interdisciplinary efforts, indicates some of the disciplines that should be involved and presents an approach to how they might work together. Finally, the article describes some gaps, opportunities and future directions in developing new theories and models that can be enacted in methods and design principles and applied to significant and complex practical problems and data.