Sanda M. Harabagiu
University of Texas at Dallas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sanda M. Harabagiu.
meeting of the association for computational linguistics | 2003
Mihai Surdeanu; Sanda M. Harabagiu; John Williams; Paul Aarseth
In this paper we present a novel, customizable IE paradigm that takes advantage of predicate-argument structures. We also introduce a new way of automatically identifying predicate argument structures, which is central to our IE paradigm. It is based on: (1) an extended set of features; and (2) inductive decision tree learning. The experimental results prove our claim that accurate predicate-argument structures enable high quality IE results.
ACM Transactions on Information Systems | 2003
Dan I. Moldovan; Marius Pasca; Sanda M. Harabagiu; Mihai Surdeanu
This paper presents an in-depth analysis of a state-of-the-art Question Answering system. Several scenarios are examined: (1) the performance of each module in a serial baseline system, (2) the impact of feedbacks and the insertion of a logic prover, and (3) the impact of various retrieval strategies and lexical resources. The main conclusion is that the overall performance depends on the depth of natural language processing resources and the tools used for answer finding.
international conference on computational linguistics | 2004
Srini Narayanan; Sanda M. Harabagiu
The ability to answer complex questions posed in Natural Language depends on (1) the depth of the available semantic representations and (2) the inferential mechanisms they support. In this paper we describe a QA architecture where questions are analyzed and candidate answers generated by 1) identifying predicate argument structures and semantic frames from the input and 2) performing structured probabilistic inference using the extracted relations in the context of a domain and scenario model. A novel aspect of our system is a scalable and expressive representation of actions and events based on Coordinated Probabilistic Relational Models (CPRM). In this paper we report on the ability of the implemented system to perform several forms of probabilistic and temporal inferences to extract answers to complex questions. The results indicate enhanced accuracy over current state-of-the-art Q/A systems.
north american chapter of the association for computational linguistics | 2003
Dan I. Moldovan; Christine Clark; Sanda M. Harabagiu; Steven J. Maiorano
Recent TREC results have demonstrated the need for deeper text understanding methods. This paper introduces the idea of automated reasoning applied to question answering and shows the feasibility of integrating a logic prover into a Question Answering system. The approach is to transform questions and answer passages into logic representations. World knowledge axioms as well as linguistic axioms are supplied to the prover which renders a deep understanding of the relationship between question text and answer text. Moreover, the trace of the proofs provide answer justifications. The results show that the prover boosts the performance of the QA system on TREC questions by 30%.
international conference on computational linguistics | 2000
Sanda M. Harabagiu; Marius Pasca; Steven J. Maiorano
This paper describes the integration of several knowledge-based natural language processing techniques into a Question Answering system, capable of mining textual answers from large collections of texts. Surprizing quality is achieved when several lightweight knowledge-based NLP techniques complement mostly shallow, surface-based approaches.
meeting of the association for computational linguistics | 2006
Sanda M. Harabagiu; Andrew Hickl
Work on the semantics of questions has argued that the relation between a question and its answer(s) can be cast in terms of logical entailment. In this paper, we demonstrate how computational systems designed to recognize textual entailment can be used to enhance the accuracy of current open-domain automatic question answering (Q/A) systems. In our experiments, we show that when textual entailment information is used to either filter or rank answers returned by a Q/A system, accuracy can be increased by as much as 20% overall.
meeting of the association for computational linguistics | 2000
Dan I. Moldovan; Sanda M. Harabagiu; Marius Pasca; Rada Mihalcea; Roxana Girju; Richard Goodrum; Vasile Rus
This paper presents the architecture, operation and results obtained with the LASSO Question Answering system developed in the Natural Language Processing Laboratory at SMU. To find answers, the system relies on a combination of syntactic and semantic techniques. The search for the answer is based on a novel form of indexing called paragraph indexing. A score of 55.5% for short answers and 64.5% for long answers was achieved at the TREC-8 competition.
international acm sigir conference on research and development in information retrieval | 2005
Sanda M. Harabagiu; V. Finley Lacatusu
The problem of using topic representations for multi-document summarization (MDS) has received considerable attention recently. In this paper, we describe five different topic representations and introduce a novel representation of topics based on topic themes. We present eight different methods of generating MDS and evaluate each of these methods on a large set of topics used in past DUC workshops. Our evaluation results show a significant improvement in the quality of summaries based on topic themes over MDS methods that use other alternative topic representations.
north american chapter of the association for computational linguistics | 2001
Sanda M. Harabagiu; Rǎzvan C. Bunescu; Steven J. Maiorano
Traditionally coreference is resolved by satisfying a combination of salience, syntactic, semantic and discourse constraints. The acquisition of such knowledge is time-consuming, difficult and error-prone. Therefore, we present a knowledge minimalist methodology of mining coreference rules from annotated text corpora. Semantic consistency evidence, which is a form of knowledge required by coreference, is easily retrieved from WordNet. Additional consistency knowledge is discovered by a meta-bootstrapping algorithm applied to unlabeled texts.
Journal of the American Medical Informatics Association | 2011
Bryan Rink; Sanda M. Harabagiu; Kirk Roberts
OBJECTIVE A supervised machine learning approach to discover relations between medical problems, treatments, and tests mentioned in electronic medical records. MATERIALS AND METHODS A single support vector machine classifier was used to identify relations between concepts and to assign their semantic type. Several resources such as Wikipedia, WordNet, General Inquirer, and a relation similarity metric inform the classifier. RESULTS The techniques reported in this paper were evaluated in the 2010 i2b2 Challenge and obtained the highest F1 score for the relation extraction task. When gold standard data for concepts and assertions were available, F1 was 73.7, precision was 72.0, and recall was 75.3. F1 is defined as 2*Precision*Recall/(Precision+Recall). Alternatively, when concepts and assertions were discovered automatically, F1 was 48.4, precision was 57.6, and recall was 41.7. DISCUSSION Although a rich set of features was developed for the classifiers presented in this paper, little knowledge mining was performed from medical ontologies such as those found in UMLS. Future studies should incorporate features extracted from such knowledge sources, which we expect to further improve the results. Moreover, each relation discovery was treated independently. Joint classification of relations may further improve the quality of results. Also, joint learning of the discovery of concepts, assertions, and relations may also improve the results of automatic relation extraction. CONCLUSION Lexical and contextual features proved to be very important in relation extraction from medical texts. When they are not available to the classifier, the F1 score decreases by 3.7%. In addition, features based on similarity contribute to a decrease of 1.1% when they are not available.