Branimir Boguraev | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Branimir Boguraev is active.

Explore More

Publication

Featured researches published by Branimir Boguraev.

international conference on computational linguistics | 1996

Anaphora for everyone: pronominal anaphora resoluation without a parser

Christopher Kennedy; Branimir Boguraev

We present an algorithm for anaphora resolution which is a modified and extended version of that developed by (Lappin and Leass, 1994). In contrast to that work, our algorithm does not require in-depth, full, syntactic parsing of text. Instead, with minimal compromise in output quality, the modifications enable the resolution process to work from the output of a part of speech tagger, enriched only with annotations of grammatical function of lexical items in the input text stream. Evaluation of the results of our implementation demonstrates that accurate anaphora resolution can be realized within natural language processing frameworks which do not---or cannot--- employ robust and reliable parsing components.

international conference on computational linguistics | 2002

Automatic glossary extraction: beyond terminology identification

Youngja Park; Roy J. Byrd; Branimir Boguraev

This paper describes a method for automatically extracting domain-specific glossaries from large document collections. We show that, compared with current text analysis methods for extracting technical terminology from text, our extracted glossaries more successfully support applications requiring knowledge of domain concepts. After presenting our methods, we illustrate the output of GlossEx, our glossary extraction tool, and present an informal evaluation of its performance.

Ibm Journal of Research and Development | 2012

Deep parsing in Watson

Michael C. McCord; James W. Murdock; Branimir Boguraev

Two deep parsing components, an English Slot Grammar (ESG) parser and a predicate-argument structure (PAS) builder, provide core linguistic analyses of both the questions and the text content used by IBM Watson™ to find and hypothesize answers. Specifically, these components are fundamental in question analysis, candidate generation, and analysis of passage evidence. As part of the Watson project, ESG was enhanced, and its performance on Jeopardy!™ questions and on established reference data was improved. PAS was built on top of ESG to support higher-level analytics. In this paper, we describe these components and illustrate how they are used in a pattern-based relation extraction component of Watson. We also provide quantitative results of evaluating the component-level performance of ESG parsing.

Ibm Journal of Research and Development | 2012

Question analysis: how watson reads a clue

Adam Lally; John M. Prager; Michael C. McCord; Branimir Boguraev; Siddharth Patwardhan; James Fan; Paul Fodor; Jennifer Chu-Carroll

The first stage of processing in the IBM Watson™ system is to perform a detailed analysis of the question in order to determine what it is asking for and how best to approach answering it. Question analysis uses Watsons parsing and semantic analysis capabilities: a deep Slot Grammar parser, a named entity recognizer, a co-reference resolution component, and a relation extraction component. We apply numerous detection rules and classifiers using features from this analysis to detect critical elements of the question, including: 1) the part of the question that is a reference to the answer (the focus); 2) terms in the question that indicate what type of entity is being asked for (lexical answer types); 3) a classification of the question into one or more of several broad types; and 4) elements of the question that play particular roles that may require special handling, for example, nested subquestions that must be separately answered. We describe how these elements are detected and evaluate the impact of accurate detection on our end-to-end question-answering system accuracy.

Artificial Intelligence | 1993

Lexical knowledge representation and natural language processing

James Pustejovsky; Branimir Boguraev

Abstract Traditionally, semantic information in computational lexicons is limited to notions such as selectional restrictions or domain-specific constraints, encoded in a “static” representation. This information is typically used in natural language processing by a simple knowledge manipulation mechanism limited to the ability to match valences of structurally related words. The most advanced device for imposing structure on lexical information is that of inheritance, both at the object (lexical items) and meta (lexical concepts) levels of lexicon. In this paper we argue that this is an impoverished view of a computational lexicon and that, for all its advantages, simple inheritance lacks the descriptive power necessary for characterizing fine-grained distinctions in the lexical semantics of words. We describe a theory of lexical semantics making use of a knowledge representation framework that offers a richer, more expressive vocabulary for lexical information. In particular, by performing specialized inference over the ways in which aspects of knowledge structures of words in context can be composed, mutually compatible and contextually relevant lexical components of words and phrases are highlighted. We discuss the relevance of this view of the lexicon, as an explanatory device accounting for language creativity, as well as a mechanism underlying the implementation of open-ended natural language processing systems. In particular, we demonstrate how lexical ambiguity resolution—now an integral part of the same procedure that creates the semantic interpretation of a sentence itself—becomes a process not of selecting from a pre-determined set of senses, but of highlighting certain lexical properties brought forth by, and relevant to, the current context.

meeting of the association for computational linguistics | 1987

The Derivation of a Grammatically Indexed Lexicon from the Longman Dictionary of Contemporary English

Branimir Boguraev; Ted Briscoe; John A. Carroll; David M. Carter; Claire Grover

We describe a methodology and associated software system for the construction of a large lexicon from an existing machine-readable (published) dictionary. The lexicon serves as a component of an English morphological and syntactic analyser and contains entries with grammatical definitions compatible with the word and sentence grammar employed by the analyser. We describe a software system with two integrated components. One of these is capable of extracting syntactically rich, theory-neutral lexical templates from a suitable machine-readable source. The second supports interactive and semi-automatic generation and testing of target lexical entries in order to derive a sizeable, accurate and consistent lexicon from the source dictionary which contains partial (and occasionally in-accurate) information. Finally, we evaluate the utility of the Longman Dictionary of Contemporary English as a suitable source dictionary for the target lexicon.

Ibm Journal of Research and Development | 2012

Finding needles in the haystack: search and candidate generation

Jennifer Chu-Carroll; James Fan; Branimir Boguraev; David Carmel; Dafna Sheinwald; Chris Welty

A key phase in the DeepQA architecture is Hypothesis Generation, in which candidate system responses are generated for downstream scoring and ranking. In the IBM Watson™ system, these hypotheses are potential answers to Jeopardy!™ questions and are generated by two components: search and candidate generation. The search component retrieves content relevant to a given question from Watsons knowledge resources. The candidate generation component identifies potential answers to the question from the retrieved content. In this paper, we present strategies developed to use characteristics of Watsons different knowledge sources and to formulate effective search queries against those sources. We further discuss a suite of candidate generation strategies that use various kinds of metadata, such as document titles or anchor texts in hyperlinked documents. We demonstrate that a combination of these strategies brings the correct answer into the candidate answer pool for 87.17% of all the questions in a blind test set, facilitating high end-to-end question-answering performance.

Ibm Journal of Research and Development | 2012

Structured data and inference in DeepQA

Aditya Kalyanpur; Branimir Boguraev; Siddharth Patwardhan; James W. Murdock; Adam Lally; Chris Welty; John M. Prager; B. Coppola; Achille B. Fokoue-Nkoutche; Lixin Zhang; Yue Pan; Z. M. Qiu

Although the majority of evidence analysis in DeepQA is focused on unstructured information (e.g., natural-language documents), several components in the DeepQA system use structured data (e.g., databases, knowledge bases, and ontologies) to generate potential candidate answers or find additional evidence. Structured data analytics are a natural complement to unstructured methods in that they typically cover a narrower range of questions but are more precise within that range. Moreover, structured data that has formal semantics is amenable to logical reasoning techniques that can be used to provide implicit evidence. The DeepQA system does not contain a single monolithic structured data module; instead, it allows for different components to use and integrate structured and semistructured data, with varying degrees of expressivity and formal specificity. This paper is a survey of DeepQA components that use structured data. Areas in which evidence from structured sources has the most impact include typing of answers, application of geospatial and temporal constraints, and the use of formally encoded a priori knowledge of commonly appearing entity types such as countries and U.S. presidents. We present details of appropriate components and demonstrate their end-to-end impact on the IBM Watsoni system.

Ibm Journal of Research and Development | 2012

Relation extraction and scoring in DeepQA

Chang Wang; Aditya Kalyanpur; James Fan; Branimir Boguraev; David Gondek

Detecting semantic relations in text is an active problem area in natural-language processing and information retrieval. For question answering, there are many advantages of detecting relations in the question text because it allows background relational knowledge to be used to generate potential answers or find additional evidence to score supporting passages. This paper presents two approaches to broad-domain relation extraction and scoring in the DeepQA question-answering framework, i.e., one based on manual pattern specification and the other relying on statistical methods for pattern elicitation, which uses a novel transfer learning technique, i.e., relation topics. These two approaches are complementary; the rule-based approach is more precise and is used by several DeepQA components, but it requires manual effort, which allows for coverage on only a small targeted set of relations (approximately 30). Statistical approaches, on the other hand, automatically learn how to extract semantic relations from the training data and can be applied to detect a large amount of relations (approximately 7,000). Although the precision of the statistical relation detectors is not as high as that of the rule-based approach, their overall impact on the system through passage scoring is statistically significant because of their broad coverage of knowledge.

dependable systems and networks | 2009

A linguistic analysis engine for natural language use case description and its application to dependability analysis in industrial use cases

Avik Sinha; Amit M. Paradkar; Palani Kumanan; Branimir Boguraev

We present 1) a novel linguistic engine made of configurable linguistic components for understanding natural language use case specification; and 2) results of the first of a kind large scale experiment of application of linguistic techniques to industrial use cases. Requirement defects are well known to have adverse effects on dependability of software systems. While formal techniques are often cited as a remedy for specification errors, natural language remains the predominant mode for specifying requirements. Therefore, for dependable system development, a natural language processing technique is required that can translate natural language textual requirements into validation ready computer models. In this paper, we present the implementation details of such a technique and the results of applying a prototype implementation of our technique to 80 industrial and academic use case descriptions. We report on the accuracy and effectiveness of our technique. The results of our experiment are very encouraging.

Explore More