Sanda M. Harabagiu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sanda M. Harabagiu is active.

Explore More

Publication

Featured researches published by Sanda M. Harabagiu.

meeting of the association for computational linguistics | 2003

Using Predicate-Argument Structures for Information Extraction

Mihai Surdeanu; Sanda M. Harabagiu; John Williams; Paul Aarseth

In this paper we present a novel, customizable IE paradigm that takes advantage of predicate-argument structures. We also introduce a new way of automatically identifying predicate argument structures, which is central to our IE paradigm. It is based on: (1) an extended set of features; and (2) inductive decision tree learning. The experimental results prove our claim that accurate predicate-argument structures enable high quality IE results.

ACM Transactions on Information Systems | 2003

Performance issues and error analysis in an open-domain question answering system

Dan I. Moldovan; Marius Pasca; Sanda M. Harabagiu; Mihai Surdeanu

This paper presents an in-depth analysis of a state-of-the-art Question Answering system. Several scenarios are examined: (1) the performance of each module in a serial baseline system, (2) the impact of feedbacks and the insertion of a logic prover, and (3) the impact of various retrieval strategies and lexical resources. The main conclusion is that the overall performance depends on the depth of natural language processing resources and the tools used for answer finding.

international conference on computational linguistics | 2004

Question answering based on semantic structures

Srini Narayanan; Sanda M. Harabagiu

The ability to answer complex questions posed in Natural Language depends on (1) the depth of the available semantic representations and (2) the inferential mechanisms they support. In this paper we describe a QA architecture where questions are analyzed and candidate answers generated by 1) identifying predicate argument structures and semantic frames from the input and 2) performing structured probabilistic inference using the extracted relations in the context of a domain and scenario model. A novel aspect of our system is a scalable and expressive representation of actions and events based on Coordinated Probabilistic Relational Models (CPRM). In this paper we report on the ability of the implemented system to perform several forms of probabilistic and temporal inferences to extract answers to complex questions. The results indicate enhanced accuracy over current state-of-the-art Q/A systems.

north american chapter of the association for computational linguistics | 2003

COGEX: a logic prover for question answering

Dan I. Moldovan; Christine Clark; Sanda M. Harabagiu; Steven J. Maiorano

Recent TREC results have demonstrated the need for deeper text understanding methods. This paper introduces the idea of automated reasoning applied to question answering and shows the feasibility of integrating a logic prover into a Question Answering system. The approach is to transform questions and answer passages into logic representations. World knowledge axioms as well as linguistic axioms are supplied to the prover which renders a deep understanding of the relationship between question text and answer text. Moreover, the trace of the proofs provide answer justifications. The results show that the prover boosts the performance of the QA system on TREC questions by 30%.

international conference on computational linguistics | 2000

Experiments with open-domain textual Question Answering

Sanda M. Harabagiu; Marius Pasca; Steven J. Maiorano

This paper describes the integration of several knowledge-based natural language processing techniques into a Question Answering system, capable of mining textual answers from large collections of texts. Surprizing quality is achieved when several lightweight knowledge-based NLP techniques complement mostly shallow, surface-based approaches.

meeting of the association for computational linguistics | 2006

Methods for Using Textual Entailment in Open-Domain Question Answering

Sanda M. Harabagiu; Andrew Hickl

Work on the semantics of questions has argued that the relation between a question and its answer(s) can be cast in terms of logical entailment. In this paper, we demonstrate how computational systems designed to recognize textual entailment can be used to enhance the accuracy of current open-domain automatic question answering (Q/A) systems. In our experiments, we show that when textual entailment information is used to either filter or rank answers returned by a Q/A system, accuracy can be increased by as much as 20% overall.

meeting of the association for computational linguistics | 2000

The structure and performance of an open-domain question answering system

Dan I. Moldovan; Sanda M. Harabagiu; Marius Pasca; Rada Mihalcea; Roxana Girju; Richard Goodrum; Vasile Rus

This paper presents the architecture, operation and results obtained with the LASSO Question Answering system developed in the Natural Language Processing Laboratory at SMU. To find answers, the system relies on a combination of syntactic and semantic techniques. The search for the answer is based on a novel form of indexing called paragraph indexing. A score of 55.5% for short answers and 64.5% for long answers was achieved at the TREC-8 competition.

international acm sigir conference on research and development in information retrieval | 2005

Topic themes for multi-document summarization

Sanda M. Harabagiu; V. Finley Lacatusu

The problem of using topic representations for multi-document summarization (MDS) has received considerable attention recently. In this paper, we describe five different topic representations and introduce a novel representation of topics based on topic themes. We present eight different methods of generating MDS and evaluate each of these methods on a large set of topics used in past DUC workshops. Our evaluation results show a significant improvement in the quality of summaries based on topic themes over MDS methods that use other alternative topic representations.

north american chapter of the association for computational linguistics | 2001

Text and knowledge mining for coreference resolution

Sanda M. Harabagiu; Rǎzvan C. Bunescu; Steven J. Maiorano

Traditionally coreference is resolved by satisfying a combination of salience, syntactic, semantic and discourse constraints. The acquisition of such knowledge is time-consuming, difficult and error-prone. Therefore, we present a knowledge minimalist methodology of mining coreference rules from annotated text corpora. Semantic consistency evidence, which is a form of knowledge required by coreference, is easily retrieved from WordNet. Additional consistency knowledge is discovered by a meta-bootstrapping algorithm applied to unlabeled texts.

Journal of the American Medical Informatics Association | 2011

Automatic extraction of relations between medical concepts in clinical texts

Bryan Rink; Sanda M. Harabagiu; Kirk Roberts

OBJECTIVE A supervised machine learning approach to discover relations between medical problems, treatments, and tests mentioned in electronic medical records. MATERIALS AND METHODS A single support vector machine classifier was used to identify relations between concepts and to assign their semantic type. Several resources such as Wikipedia, WordNet, General Inquirer, and a relation similarity metric inform the classifier. RESULTS The techniques reported in this paper were evaluated in the 2010 i2b2 Challenge and obtained the highest F1 score for the relation extraction task. When gold standard data for concepts and assertions were available, F1 was 73.7, precision was 72.0, and recall was 75.3. F1 is defined as 2*Precision*Recall/(Precision+Recall). Alternatively, when concepts and assertions were discovered automatically, F1 was 48.4, precision was 57.6, and recall was 41.7. DISCUSSION Although a rich set of features was developed for the classifiers presented in this paper, little knowledge mining was performed from medical ontologies such as those found in UMLS. Future studies should incorporate features extracted from such knowledge sources, which we expect to further improve the results. Moreover, each relation discovery was treated independently. Joint classification of relations may further improve the quality of results. Also, joint learning of the discovery of concepts, assertions, and relations may also improve the results of automatic relation extraction. CONCLUSION Lexical and contextual features proved to be very important in relation extraction from medical texts. When they are not available to the classifier, the F1 score decreases by 3.7%. In addition, features based on similarity contribute to a decrease of 1.1% when they are not available.

Explore More