David Gondek | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Gondek is active.

Explore More

Publication

Featured researches published by David Gondek.

Artificial Intelligence | 2013

Watson: beyond jeopardy!

David A. Ferrucci; Anthony Levas; Sugato Bagchi; David Gondek; Erik T. Mueller

This paper presents a vision for applying the Watson technology to health care and describes the steps needed to adapt and improve performance in a new domain. Specifically, it elaborates upon a vision for an evidence-based clinical decision support system, based on the DeepQA technology, that affords exploration of a broad range of hypotheses and their associated evidence, as well as uncovers missing information that can be used in mixed-initiative dialog. It describes the research challenges, the adaptation approach, and finally reports results on the first steps we have taken toward this goal.

Ibm Journal of Research and Development | 2012

Automatic knowledge extraction from documents

James Fan; Aditya Kalyanpur; David Gondek; David A. Ferrucci

Access to a large amount of knowledge is critical for success at answering open-domain questions for DeepQA systems such as IBM Watson™. Formal representation of knowledge has the advantage of being easy to reason with, but acquisition of structured knowledge in open domains from unstructured data is often difficult and expensive. Our central hypothesis is that shallow syntactic knowledge and its implied semantics can be easily acquired and can be used in many areas of a question-answering system. We take a two-stage approach to extract the syntactic knowledge and implied semantics. First, shallow knowledge from large collections of documents is automatically extracted. Second, additional semantics are inferred from aggregate statistics of the automatically extracted shallow knowledge. In this paper, we describe in detail what kind of shallow knowledge is extracted, how it is automatically done from a large corpus, and how additional semantics are inferred from aggregate statistics. We also briefly discuss the various ways extracted knowledge is used throughout the IBM DeepQA system.

Ibm Journal of Research and Development | 2012

A framework for merging and ranking of answers in DeepQA

David Gondek; Adam Lally; Aditya Kalyanpur; James W. Murdock; P. A. Duboue; Lixin Zhang; Yue Pan; Z. M. Qiu; Chris Welty

The final stage in the IBM DeepQA pipeline involves ranking all candidate answers according to their evidence scores and judging the likelihood that each candidate answer is correct. In DeepQA, this is done using a machine learning framework that is phase-based, providing capabilities for manipulating the data and applying machine learning in successive applications. We show how this design can be used to implement solutions to particular challenges that arise in applying machine learning for evidence-based hypothesis evaluation. Our approach facilitates an agile development environment for DeepQA; evidence scoring strategies can be easily introduced, revised, and reconfigured without the need for error-prone manual effort to determine how to combine the various evidence scores. We describe the framework, explain the challenges, and evaluate the gain over a baseline machine learning approach.

Ibm Journal of Research and Development | 2012

Relation extraction and scoring in DeepQA

Chang Wang; Aditya Kalyanpur; James Fan; Branimir Boguraev; David Gondek

Detecting semantic relations in text is an active problem area in natural-language processing and information retrieval. For question answering, there are many advantages of detecting relations in the question text because it allows background relational knowledge to be used to generate potential answers or find additional evidence to score supporting passages. This paper presents two approaches to broad-domain relation extraction and scoring in the DeepQA question-answering framework, i.e., one based on manual pattern specification and the other relying on statistical methods for pattern elicitation, which uses a novel transfer learning technique, i.e., relation topics. These two approaches are complementary; the rule-based approach is more precise and is used by several DeepQA components, but it requires manual effort, which allows for coverage on only a small targeted set of relations (approximately 30). Statistical approaches, on the other hand, automatically learn how to extract semantic relations from the training data and can be applied to detect a large amount of relations (approximately 7,000). Although the precision of the statistical relation detectors is not as high as that of the rule-based approach, their overall impact on the system through passage scoring is statistically significant because of their broad coverage of knowledge.

Ibm Journal of Research and Development | 2012

Typing candidate answers using type coercion

James W. Murdock; Aditya Kalyanpur; Chris Welty; James Fan; David A. Ferrucci; David Gondek; Lixin Zhang; H. Kanayama

Many questions explicitly indicate the type of answer required. One popular approach to answering those questions is to develop recognizers to identify instances of common answer types (e.g., countries, animals, and food) and consider only answers on those lists. Such a strategy is poorly suited to answering questions from the Jeopardy!™ television quiz show. Jeopardy! questions have an extremely broad range of types of answers, and the most frequently occurring types cover only a small fraction of all answers. We present an alternative approach to dealing with answer types. We generate candidate answers without regard to type, and for each candidate, we employ a variety of sources and strategies to judge whether the candidate has the desired type. These sources and strategies provide a set of type coercion scores for each candidate answer. We use these scores to give preference to answers with more evidence of having the right type. Our question-answering system is significantly more accurate with type coercion than it is without type coercion; these components have a combined impact of nearly 5% on the accuracy of the IBM Watson™ question-answering system.

conference on information and knowledge management | 2012

Learning to rank for robust question answering

Arvind Agarwal; Hema Raghavan; Karthik Subbian; Prem Melville; Richard D. Lawrence; David Gondek; James Fan

This paper aims to solve the problem of improving the ranking of answer candidates for factoid based questions in a state-of-the-art Question Answering system. We first provide an extensive comparison of 5 ranking algorithms on two datasets -- from the Jeopardy quiz show and a medical domain. We then show the effectiveness of a cascading approach, where the ranking produced by one ranker is used as input to the next stage. The cascading approach shows sizeable gains on both datasets. We finally evaluate several rank aggregation techniques to combine these algorithms, and find that Supervised Kemeny aggregation is a robust technique that always beats the baseline ranking approach used by Watson for the Jeopardy competition. We further corroborate our results on TREC Question Answering datasets.

Knowledge and Information Systems | 2007

Non-redundant data clustering

David Gondek; Thomas Hofmann

Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice, this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to deal with this problem, we present an extension of the information bottleneck framework, called coordinated conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score subject to constraints. Algorithmically, one can apply an alternating optimization scheme that can be used in conjunction with different types of numeric and non-numeric attributes. We discuss extensions of the technique to the tasks of semi-supervised classification and enumeration of successive non-redundant clusterings. We present experimental results for applications in text mining and computer vision.

Ai Magazine | 2010

Building Watson: An Overview of the DeepQA Project

David A. Ferrucci; Eric W. Brown; Jennifer Chu-Carroll; James Fan; David Gondek; Aditya Kalyanpur; Adam Lally; J. William Murdock; Eric Nyberg; John M. Prager; Nico Schlaefer; Christopher A. Welty

Archive | 2010