Gideon S. Mann
Johns Hopkins University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gideon S. Mann.
north american chapter of the association for computational linguistics | 2003
Gideon S. Mann; David Yarowsky
This paper presents a set of algorithms for distinguishing personal names with multiple real referents in text, based on little or no supervision. The approach utilizes an unsupervised clustering technique over a rich feature space of biographic facts, which are automatically extracted via a language-independent bootstrapping process. The induced clustering of named entities are then partitioned and linked to their real referents via the automatically extracted biographic data. Performance is evaluated based on both a test set of handlabeled multi-referent personal names and via automatically generated pseudonames.
north american chapter of the association for computational linguistics | 2001
Gideon S. Mann; David Yarowsky
This paper presents a method for inducing translation lexicons based on transduction models of cognate pairs via bridge languages. Bilingual lexicons within languages families are induced using probabilistic string edit distance models. Translation lexicons for arbitrary distant language pairs are then generated by a combination of these intra-family translation models and one or more cross-family on-line dictionaries. Up to 95% exact match accuracy is achieved on the target vocabulary (30-68% of inter-family test pairs). Thus substantial portions of translation lexicons can be generated accurately for languages where no bilingual dictionary or parallel corpora may exist.
international conference on machine learning | 2007
Gideon S. Mann; Andrew McCallum
Although semi-supervised learning has been an active area of research, its use in deployed applications is still relatively rare because the methods are often difficult to implement, fragile in tuning, or lacking in scalability. This paper presents expectation regularization, a semi-supervised learning method for exponential family parametric models that augments the traditional conditional label-likelihood objective function with an additional term that encourages model predictions on unlabeled data to match certain expectations---such as label priors. The method is extremely easy to implement, scales as well as logistic regression, and can handle non-independent features. We present experiments on five different data sets, showing accuracy improvements over other semi-supervised methods.
Natural Language Engineering | 2001
Marc Light; Gideon S. Mann; Ellen Riloff; Eric Breck
In this paper, we take a detailed look at the performance of components of an idealized question answering system on two different tasks: the TREC Question Answering task and a set of reading comprehension exams. We carry out three types of analysis: inherent properties of the data, feature analysis, and performance bounds. Based on these analyses we explain some of the performance results of the current generation of Q/A systems and make predictions on future work. In particular, we present four findings: (1) Q/A system performance is correlated with answer repetition; (2) relative overlap scores are more effective than absolute overlap scores; (3) equivalence classes on scoring functions can be used to quantify performance bounds; and (4) perfect answer typing still leaves a great deal of ambiguity for a Q/A system because sentences often contain several items of the same type.
international conference on computational linguistics | 2002
Gideon S. Mann
The WordNet lexical ontology, which is primarily composed of common nouns, has been widely used in retrieval tasks. Here, we explore the notion of a fine-grained proper noun ontology and argue for the utility of such an ontology in retrieval tasks. To support this claim, we build a fine-grained proper noun ontology from unrestricted news text and use this ontology to improve performance on a question answering task.
north american chapter of the association for computational linguistics | 2003
David A. Smith; Gideon S. Mann
We present minimally supervised methods for training and testing geographic name disambiguation (GND) systems. We train data-driven place name classifiers using toponyms already disambiguated in the training text --- by such existing cues as Nashville, Tenn. or Springfield, MA --- and test the system on texts where these cues have been stripped out and on hand-tagged historical texts. We experiment on three English-language corpora of varying provenance and complexity: newsfeed from the 1990s, personal narratives from the 19th century American west, and memoirs and records of the U.S. Civil War. Disambiguation accuracy ranges from 87% for news to 69% for some historical collections.
meeting of the association for computational linguistics | 2005
Gideon S. Mann; David Yarowsky
In this paper, we examine the task of extracting a set of biographic facts about target individuals from a collection of Web pages. We automatically annotate training text with positive and negative examples of fact extractions and train Rote, Naive Bayes, and Conditional Random Field extraction models for fact extraction from individual Web pages. We then propose and evaluate methods for fusing the extracted information across documents to return a consensus answer. A novel cross-field bootstrapping method leverages data interdependencies to yield improved performance.
arXiv: Computation and Language | 2001
Eric Breck; Marc Light; Gideon S. Mann; Ellen Riloff; Brianne Brown; Pranav Anand; Mats Rooth; Michael Thelen
In this paper we analyze two question answering tasks: the TREC-8 question answering task and a set of reading comprehension exams. First, we show that Q/A systems perform better when there are multiple answer opportunities per question. Next, we analyze common approaches to two subproblems: term overlap for answer sentence identification, and answer typing for short answer extraction. We present general tools for analyzing the strengths and limitations of techniques for these sub-problems. Our results quantify the limitations of both term overlap and answer typing to distinguish between competing answer candidates.
north american chapter of the association for computational linguistics | 2007
Gideon S. Mann; Andrew McCallum
Entropy regularization is a straightforward and successful method of semi-supervised learning that augments the traditional conditional likelihood objective function with an additional term that aims to minimize the predicted label entropy on unlabeled data. It has previously been demonstrated to provide positive results in linear-chain CRFs, but the published method for calculating the entropy gradient requires significantly more computation than supervised CRF training. This paper presents a new derivation and dynamic program for calculating the entropy gradient that is significantly more efficient---having the same asymptotic time complexity as supervised CRF training. We also present efficient generalizations of this method for calculating the label entropy of all sub-sequences, which is useful for active learning, among other applications.
ODQA '01 Proceedings of the workshop on Open-domain question answering - Volume 12 | 2001
Gideon S. Mann
This paper presents a simple, general method for using the Mutual Information (MI) statistic trained on unannotated trivia questions to estimate question class/semantic tag correlation. This MI method and a variety of question classifiers and semantic taggers are used to build short-answer extractors that show improvement over a hand-built match module using a similar question classifier and semantic tagger.