Jeffrey S. Coombs
University of Nevada, Las Vegas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jeffrey S. Coombs.
international conference on information technology coding and computing | 2003
Kazem Taghva; Julie Borsack; Jeffrey S. Coombs; Allen Condit; Steven E. Lumos; Thomas A. Nartker
We report on the construction of an ontology that applies rules for identification of features to be used for email classification. The associated probabilities for these features are then calculated from the training set of emails and used as a part of the feature vectors for an underlying Bayesian classifier.
document analysis systems | 2006
Kazem Taghva; Russell Beckley; Jeffrey S. Coombs
OCR error has been shown not to affect the average accuracy of text retrieval or text categorization.Recent studies however have indicated that information extraction is significantly degraded by OCR error. We experimented with information extraction software on two collections, one with OCR-ed documents and another with manually-corrected versions of the former. We discovered a significant reduction in accuracy on the OCR text versus the corrected text. The majority of errors were attributable to zoning problems rather than OCR classification errors.
document recognition and retrieval | 2005
Kazem Taghva; Jeffrey S. Coombs; Ray Pereda; Thomas A. Nartker
This paper presents the implementation and evaluation of a Hidden Markov Model to extract addresses from OCR text. Although Hidden Markov Models discover addresses with high precision and recall, this type of Information Extraction task seems to be affected negatively by the presence of OCR text.
international conference on information technology coding and computing | 2004
Kazem Taghva; Jeffrey S. Coombs; Ray Pareda; Tom Nartker
We report on an application of language modeling techniques to the retrieval of Farsi documents. We discovered that language modeling improves the precision of retrieval when compared to a standard vector space model.
document recognition and retrieval | 2003
Kazem Taghva; Julie Borsack; Thomas A. Nartker; Jeffrey S. Coombs; Ron Young
Hundreds of experiments over the last decade on the retrieval of OCR documents performed by the Information Science Research Institute have shown that OCR errors do not significantly affect retrievability. We extend those results to show that in the case of proximity searching, the removal of running headers and footers from OCR text will not improve retrievability for such searches.
international conference on information technology coding and computing | 2003
Wolfgang W. Bein; Jeffrey S. Coombs; Kazem Taghva
We present an efficient algorithm called the Quadtree Heuristic for identifying a list of similar terms for each unique term in a large document collection. Term similarity is defined using the expected mutual information measure (EMIM). Since our aim for defining the similarity lists is to improve information retrieval (IR), we present the outcome of an experiment comparing the performance of an IR engine designed to use the similarity lists. Two methods were used to generate similarity lists: a brute-force technique and the Quadtree Heuristic. The performance of the list generated by the Quadtree Heuristic was commensurate with the brute force list.
Archive | 2003
Jeffrey S. Coombs
There was an amazing amount of attention devoted to modalities in “second” or “post-medieval” scholasticism. Modalities were approached with great sophistication from logical, epistemological, and metaphysical perspectives. I will concentrate here on presenting an overview of Catholic second scholastic discussions of the ontological basis for logical possibility. This problem was the subject of much debate among the Scholastics of the time, so much so that for the first time ever in the history of scholasticism (medieval or modern) it was granted its own distinct quaestio in the philosophical texts of the early seventeenth century.
document recognition and retrieval | 2006
Kazem Taghva; Russell Beckley; Jeffrey S. Coombs; Julie Borsack; Ray Pereda; Thomas A. Nartker
We report on an attempt to build an automatic redaction system by applying information extraction techniques to the identification of private dates of birth. We conclude that automatic redaction is a promising concept although information extraction is significantly affected by the presence of OCR error.
international conference on conceptual structures | 2011
Kazem Taghva; Russell Beckley; Jeffrey S. Coombs
Many applications of Formal Concept Analysis (FCA) start with a set of structured data such as objects and their properties. In practice, most of the data which is readily available are in the form of unstructured or semistructured text. A typical application of FCA assumes the extraction of objects and their properties by some other methods or techniques. For example, in the 2003 Los Alamos National Lab (LANL) project on Advanced Knowledge Integration In Assessing Terrorist Threats, a data extraction tool was used to mine the text for the structured data. In this paper, we provide a detailed description of our approach to extraction ofpersonal names forpossible subsequent use inFCA. Our basic approach is to integrate statistics on names and other words into an adaptation of a Hidden Markov Model (HMM). We use lists of names and their relative frequencies compiled from U.S. Census data. We also use a list of non-name words along with their frequencies in a training set from our collection of documents. These lists are compiled into one master list to be used as a part of the design.
document recognition and retrieval | 2003
Kazem Taghva; Jeffrey S. Coombs
A rule-based automatic text categorizer was tested to see if two types of thesaurus expansion, called query expansion and Junker expansion respectively, would improve categorization. Thesauri used were domain-specific to an OCR test collection focussed on a single topic. Results show that neither type of expansion significantly improved categorization.