Tom O'Hara | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tom O'Hara is active.

Explore More

Publication

Featured researches published by Tom O'Hara.

meeting of the association for computational linguistics | 1999

Development and Use of a Gold-Standard Data Set for Subjectivity Classifications

Janyce Wiebe; Rebecca F. Bruce; Tom O'Hara

This paper presents a case study of analyzing and improving intercoder reliability in discourse tagging using statistical techniques. Bias-corrected tags are formulated and successfully used to guide a revision of the coding manual and develop an automatic classifier.

Journal of Artificial Intelligence Research | 1998

An empirical approach to temporal reference resolution

Janyce Wiebe; Tom O'Hara; Thorsten Öhrström-Sandgren; Kenneth J. McKeever

Scheduling dialogs, during which people negotiate the times of appointments, are common in everyday life. This paper reports the results of an in-depth empirical investigation of resolving explicit temporal references in scheduling dialogs. There are four phases of this work: data annotation and evaluation, model development, system implementation and evaluation, and model evaluation and analysis. The system and model were developed primarily on one set of data, and then applied later to a much more complex data set, to assess the generalizability of the model for the task being performed. Many different types of empirical methods are applied to pinpoint the strengths and weaknesses of the approach. Detailed annotation instructions were developed and an intercoder reliability study was performed, showing that naive annotators can reliably perform the targeted annotations. A fully automatic system has been developed and evaluated on unseen test data, with good results on both data sets. We adopt a pure realization of a recency-based focus model to identify precisely when it is and is not adequate for the task being addressed. In addition to system results, an in-depth evaluation of the model itself is presented, based on detailed manual annotations. The results are that few errors occur specifically due to the model of focus being used, and the set of anaphoric relations defined in the model are low in ambiguity for both data sets.

north american chapter of the association for computational linguistics | 2003

Preposition semantic classification via Penn Treebank and FrameNet

Tom O'Hara; Janyce Wiebe

This paper reports on experiments in classifying the semantic role annotations assigned to prepositional phrases in both the Penn Treebank and FrameNet. In both cases, experiments are done to see how the prepositions can be classified given the datasets role inventory, using standard word-sense disambiguation features. In addition to using traditional word collocations, the experiments incorporate class-based collocations in the form of WordNet hypernyms. For Treebank, the word collocations achieve slightly better performance: 78.5% versus 77.4% when separate classifiers are used per preposition. When using a single classifier for all of the prepositions together, the combined approach yields a significant gain at 85.8% accuracy versus 81.3% for word-only collocations. For FrameNet, the combined use of both collocation types achieves better performance for the individual classifiers: 70.3% versus 68.5%. However, classification using a single classifier is not effective due to confusion among the fine-grained roles.

Computational Linguistics | 2009

Exploiting semantic role resources for preposition disambiguation

Tom O'Hara; Janyce Wiebe

This article describes how semantic role resources can be exploited for preposition disambiguation. The main resources include the semantic role annotations provided by the Penn Treebank and FrameNet tagged corpora. The resources also include the assertions contained in the Factotum knowledge base, as well as information from Cyc and Conceptual Graphs. A common inventory is derived from these in support of definition analysis, which is the motivation for this work. The disambiguation concentrates on relations indicated by prepositional phrases, and is framed as word-sense disambiguation for the preposition in question. A new type of feature for word-sense disambiguation is introduced, using WordNet hypernyms as collocations rather than just words. Various experiments over the Penn Treebank and FrameNet data are presented, including prepositions classified separately versus together, and illustrating the effects of filtering. Similar experimentation is done over the Factotum data, including a method for inferring likely preposition usage from corpora, as knowledge bases do not generally indicate how relationships are expressed in English (in contrast to the explicit annotations on this in the Penn Treebank and FrameNet). Other experiments are included with the FrameNet data mapped into the common relation inventory developed for definition analysis, illustrating how preposition disambiguation might be applied in lexical acquisition.

Computers and The Humanities | 2000

Selecting Decomposable Models for Word-Sense Disambiguation: TheGrling-Sdm System

Tom O'Hara; Janyce Wiebe; Rebecca F. Bruce

This paper describes the grling-sdm system, which is asupervised probabilistic classifier that participated in the 1998SENSEVAL competition for word-sense disambiguation. This systemuses model search to select decomposable probability models describingthe dependencies among the feature variables.These types of models have been found to be advantageous in terms ofefficiency and representational power. Performance on the SENSEVALevaluation data is discussed.

international conference on computational linguistics | 2003

Classifying functional relations in factotum via WordNet hypernym associations

Tom O'Hara; Janyce Wiebe

This paper describes how to automatically classify the functional relations from the FACTOTUM knowledge base via a statistical machine learning algorithm. This incorporates a method for inferring prepositional relation indicators from corpus data. It also uses lexical collocations (i.e., word associations) and class-based collocations based on the WordNet hypernym relations (i.e., is-subset-of). The result shows substantial improvement over a baseline approach.

international conference on computational linguistics | 2004

Inferring parts of speech for lexical mappings via the Cyc KB

Tom O'Hara; Stefano Bertolo; Michael J. Witbrock; Bjørn Aldag; Jon Curtis; Kathy Panton; Dave Schneider; Nancy Salay

We present an automatic approach to learning criteria for classifying the parts-of-speech used in lexical mappings. This will further automate our knowledge acquisition system for non-technical users. The criteria for the speech parts are based on the types of the denoted terms along with morphological and corpus-based clues. Associations among these and the parts-of-speech are learned using the lexical mappings contained in the Cyc knowledge base as training data. With over 30 speech parts to choose from, the classifier achieves good results (77.8% correct). Accurate results (93.0%) are achieved in the special case of the mass-count distinction for nouns. Comparable results are also obtained using OpenCyc (73.1% general and 88.4% mass-count).

ElectricDict '04 Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries | 2004

Empirical acquisition of differentiating relations from definitions

Tom O'Hara; Janyce Wiebe

This paper describes a new automatic approach for extracting conceptual distinctions from dictionary definitions. A broad-coverage dependency parser is first used to extract the lexical relations from the definitions. Then the relations are disambiguated using associations learned from tagged corpora. This contrasts with earlier approaches using manually developed rules for disambiguation.

Archive | 2003

Inducing criteria for mass noun lexical mappings using the Cyc KB, and its extension to WordNet

Tom O'Hara; Nancy Salay; Michael J. Witbrock; Dave Schneider; Bjrn Aldag; Stefano Bertolo; Kathy Panton; Fritz Lehmann; Jon Curtis; Matt Smith; David Baxter; Peter Wagner

Archive | 2005