Smaranda Muresan
Rutgers University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Smaranda Muresan.
language resources and evaluation | 2002
Judith L. Klavans; Smaranda Muresan
This paper describes a method toward automatically building dictionaries from text. We present DEFINDER, a rule-based system for extraction of definitions from on-line consumer-oriented medical articles. We provide an extensive evaluation on three dimensions: i) performance of the definition extraction technique in terms of precision and recall, ii) quality of the built dictionary as judged both by specialists and lay users, iii) coverage of existing on-line dictionaries. The corpus we used for the study is publicly available. A major contribution of the paper is the range of quantitative and qualitative evaluation methods.
american medical informatics association annual symposium | 2000
Judith L. Klavans; Smaranda Muresan
INTRODUCTION The problem addressed in this paper concerns the automatic identification and extraction of medical terms along with their definitions and modifiers from full text consumer-oriented medical articles. The system, DEFINDER (Definition Finder), uses rule-based techniques. The output of our system can be used in several applications: creation and/or enhancement of on-line terminological resources, summarization and text categorization according to level of expertise, e.g. lay vs. technical.
meeting of the association for computational linguistics | 2014
Debanjan Ghosh; Smaranda Muresan; Nina Wacholder; Mark Aakhus; Matthew Mitsui
Argument mining of online interactions is in its infancy. One reason is the lack of annotated corpora in this genre. To make progress, we need to develop a principled and scalable way of determining which portions of texts are argumentative and what is the nature of argumentation. We propose a two-tiered approach to achieve this goal and report on several initial studies to assess its potential.
american medical informatics association annual symposium | 2001
Judith L. Klavans; Smaranda Muresan
In this paper we present a quantitative and qualitative evaluation of DEFINDER, a rule-based system that mines consumer-oriented full text articles in order to extract definitions and the terms they define. The quantitative evaluation shows that in terms of precision and recall as measured against human performance, DEFINDER obtained 87% and 75% respectively, thereby revealing the incompleteness of existing resources and the ability of DEFINDER to address these gaps. Our basis for comparison is definitions from on-line dictionaries, including the UMLS Metathesaurus. Qualitative evaluation shows that the definitions extracted by our system are ranked higher in terms of user-centered criteria of usability and readability than are definitions from on-line specialized dictionaries. The output of DEFINDER can be used to enhance these dictionaries. DEFINDER output is being incorporated in a system to clarify technical terms for non-specialist users in understandable non-technical language.
conference on computational natural language learning | 2001
Smaranda Muresan; Evelyne Tzoukermann; Judith L. Klavans
This paper shows that linguistic techniques along with machine learning can extract high quality noun phrases for the purpose of providing the gist or summary of email messages. We describe a set of comparative experiments using several machine learning algorithms for the task of salient noun phrase extraction. Three main conclusions can be drawn from this study: (i) the modifiers of a noun phrase can be semantically as important as the head for the task of gisting, (ii) linguistic filtering improves the performance of machine learning algorithms, (iii) a combination of classifiers improves accuracy.
acm/ieee joint conference on digital libraries | 2001
Judith L. Klavans; Smaranda Muresan
In this paper we present DEFINDER, a rule-based system that mines cons umer-oriented full text articles in order to extract definitions and the terms they define. This research is part of Digital Library Project at Columbia University, entitled PERSIVAL (PErsonalized Retrieval and Summarization of Image, Video and Language resources) [5]. One goal of the project is to present information to patients in language they can understand. A key component of this stage is to provide accurate and readable lay definitions for technical terms, which may be present in articles of intermediate complexity. The focus of this short paper is on quantitative and qualitative evaluation of the DEFINDER system [3]. Our basis for comparison was definitions from Unified Medical Language System (UMLS), On-line Medical Dictionary (OMD) and Glossary of Popular and Technical Medical Terms (GPTMT). Quantitative evaluations show that DEFINDER obtained 87% precision and 75% recall and reveal the incompleteness of existing resources and the ability of DEFINDER to address gaps. Qualitative evaluation shows that the definitions extracted by our system are ranked higher in terms of user-based criteria of usability and readability than definitions from on-line specialized dictionaries. Thus the output of DEFINDER can be used to enhance existing specialized dictionaries, and also as a key feature in summarizing technical articles for non-specialist users.
empirical methods in natural language processing | 2015
Debanjan Ghosh; Weiwei Guo; Smaranda Muresan
Sarcasm is generally characterized as a figure of speech that involves the substitution of a literal by a figurative meaning, which is usually the opposite of the original literal meaning. We re-frame the sarcasm detection task as a type of word sense disambiguation problem, where the sense of a word is either literal or sarcastic. We call this the Literal/Sarcastic Sense Disambiguation (LSSD) task. We address two issues: 1) how to collect a set of target words that can have either literal or sarcastic meanings depending on context; and 2) given an utterance and a target word, how to automatically detect whether the target word is used in the literal or the sarcastic sense. For the latter, we investigate several distributional semantics methods and show that a Support Vector Machines (SVM) classifier with a modified kernel using word embeddings achieves a 7-10% F1 improvement over a strong lexical baseline.
robotics: science and systems | 2015
James MacGlashan; Monica Babeş-Vroman; Marie desJardins; Michael L. Littman; Smaranda Muresan; Shawn Squire; Stefanie Tellex; Dilip Arumugam; Lei Yang
As intelligent robots become more prevalent, methods to make interaction with the robots more accessible are increasingly important. Communicating the tasks that a person wants the robot to carry out via natural language, and training the robot to ground the natural language through demonstration, are especially appealing approaches for interaction, since they do not require a technical background. However, existing approaches map natural language commands to robot command languages that directly express the sequence of actions the robot should execute. This sequence is often specific to a particular situation and does not generalize to new situations. To address this problem, we present a system that grounds natural language commands into reward functions using demonstrations of different natural language commands being carried out in the environment. Because language is grounded to reward functions, rather than explicit actions that the robot can perform, commands can be high-level, carried out in novel environments autonomously, and even transferred to other robots with different action spaces. We demonstrate that our learned model can be both generalized to novel environments and transferred to a robot with a different action space than the action space used during training.
human language technology | 2001
Evelyne Tzoukermann; Smaranda Muresan; Judith L. Klavans
We present a system for the automatic extraction of salient information from email messages, thus providing the gist of their meaning. Dealing with email raises several challenges that we address in this paper: heterogeneous data in terms of length and topic. Our method combines shallow linguistic processing with machine learning to extract phrasal units that are representative of email content. The GIST-IT application is fully implemented and embedded in an active mailbox platform. Evaluation was performed over three machine learning paradigms.
international joint conference on artificial intelligence | 2011
Smaranda Muresan
The paper addresses the problem of learning to parse sentences to logical representations of their underlying meaning, by inducing a syntactic-semantic grammar. The approach uses a class of grammars which has been proven to be learnable from representative examples. In this paper, we introduce tractable learning algorithms for learning this class of grammars, comparing them in terms of a-priori knowledge needed by the learner, hypothesis space and algorithm complexity. We present experimental results on learning tense, aspect, modality and negation of verbal constructions.