Graciela Rosemblat
National Institutes of Health
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Graciela Rosemblat.
Bioinformatics | 2012
Halil Kilicoglu; Dongwook Shin; Marcelo Fiszman; Graciela Rosemblat; Thomas C. Rindflesch
SUMMARY Effective access to the vast biomedical knowledge present in the scientific literature is challenging. Semantic relations are increasingly used in knowledge management applications supporting biomedical research to help address this challenge. We describe SemMedDB, a repository of semantic predications (subject-predicate-object triples) extracted from the entire set of PubMed citations. We propose the repository as a knowledge resource that can assist in hypothesis generation and literature-based discovery in biomedicine as well as in clinical decision-making support. AVAILABILITY AND IMPLEMENTATION The SemMedDB repository is available as a MySQL database for non-commercial use at http://skr3.nlm.nih.gov/SemMedDB. An UMLS Metathesaurus license is required. CONTACT [email protected].
Information services & use | 2011
Thomas C. Rindflesch; Halil Kilicoglu; Marcelo Fiszman; Graciela Rosemblat; Dongwook Shin
To support more effective biomedical information management, Semantic MEDLINE integrates document retrieval, advanced natural language processing, automatic summarization and visualization into a single Web portal. The application is intended to help manage the results of PubMed searches by condensing core semantic content in the citations retrieved. Output is presented as a connected graph of semantic relations, with links to the original MEDLINE citations. The ability to connect salient information across documents helps users keep up with the research literature and discover connections which might otherwise go unnoticed. Semantic MEDLINE can make an impact on biomedicine by supporting scientific discovery and the timely translation of insights from basic research into advances in clinical practice and patient care. Semantic MEDLINE is illustrated here with recent research on the clock genes.
BMC Bioinformatics | 2011
Halil Kilicoglu; Graciela Rosemblat; Marcelo Fiszman; Thomas C. Rindflesch
BackgroundSemantic relations increasingly underpin biomedical text mining and knowledge discovery applications. The success of such practical applications crucially depends on the quality of extracted relations, which can be assessed against a gold standard reference. Most such references in biomedical text mining focus on narrow subdomains and adopt different semantic representations, rendering them difficult to use for benchmarking independently developed relation extraction systems. In this article, we present a multi-phase gold standard annotation study, in which we annotated 500 sentences randomly selected from MEDLINE abstracts on a wide range of biomedical topics with 1371 semantic predications. The UMLS Metathesaurus served as the main source for conceptual information and the UMLS Semantic Network for relational information. We measured interannotator agreement and analyzed the annotations closely to identify some of the challenges in annotating biomedical text with relations based on an ontology or a terminology.ResultsWe obtain fair to moderate interannotator agreement in the practice phase (0.378-0.475). With improved guidelines and additional semantic equivalence criteria, the agreement increases by 12% (0.415 to 0.536) in the main annotation phase. In addition, we find that agreement increases to 0.688 when the agreement calculation is limited to those predications that are based only on the explicitly provided UMLS concepts and relations.ConclusionsWhile interannotator agreement in the practice phase confirms that conceptual annotation is a challenging task, the increasing agreement in the main annotation phase points out that an acceptable level of agreement can be achieved in multiple iterations, by setting stricter guidelines and establishing semantic equivalence criteria. Mapping text to ontological concepts emerges as the main challenge in conceptual annotation. Annotating predications involving biomolecular entities and processes is particularly challenging. While the resulting gold standard is mainly intended to serve as a test collection for our semantic interpreter, we believe that the lessons learned are applicable generally.
Sleep | 2012
Christopher M. Miller; Thomas C. Rindflesch; Marcelo Fiszman; Dimitar Hristovski; Dongwook Shin; Graciela Rosemblat; Han Zhang; Kingman P. Strohl
STUDY OBJECTIVES Sleep quality commonly diminishes with age, and, further, aging men often exhibit a wider range of sleep pathologies than women. We used a freely available, web-based discovery technique (Semantic MEDLINE) supported by semantic relationships to automatically extract information from MEDLINE titles and abstracts. DESIGN We assumed that testosterone is associated with sleep (the A-C relationship in the paradigm) and looked for a mechanism to explain this association (B explanatory link) as a potential or partial mechanism underpinning the etiology of eroded sleep quality in aging men. MEASUREMENTS AND RESULTS Review of full-text papers in critical nodes discovered in this manner resulted in the proposal that testosterone enhances sleep by inhibiting cortisol. Using this discovery method, we posit, and could confirm as a novel hypothesis, cortisol as part of a mechanistic link elucidating the observed correlation between decreased testosterone in aging men and diminished sleep quality. CONCLUSIONS This approach is publically available and useful not only in this manner but also to generate from the literature alternative explanatory models for observed experimental results.
Journal of Biomedical Informatics | 2011
Han Zhang; Marcelo Fiszman; Dongwook Shin; Christopher M. Miller; Graciela Rosemblat; Thomas C. Rindflesch
Automatic summarization has been proposed to help manage the results of biomedical information retrieval systems. Semantic MEDLINE, for example, summarizes semantic predications representing assertions in MEDLINE citations. Results are presented as a graph which maintains links to the original citations. Graphs summarizing more than 500 citations are hard to read and navigate, however. We exploit graph theory for focusing these large graphs. The method is based on degree centrality, which measures connectedness in a graph. Four categories of clinical concepts related to treatment of disease were identified and presented as a summary of input text. A baseline was created using term frequency of occurrence. The system was evaluated on summaries for treatment of five diseases compared to a reference standard produced manually by two physicians. The results showed that recall for system results was 72%, precision was 73%, and F-score was 0.72. The system F-score was considerably higher than that for the baseline (0.47).
Journal of Biomedical Informatics | 2014
Rui Zhang; Michael J. Cairelli; Marcelo Fiszman; Graciela Rosemblat; Halil Kilicoglu; Thomas C. Rindflesch; Serguei V. S. Pakhomov; Genevieve B. Melton
In this study we report on potential drug-drug interactions between drugs occurring in patient clinical data. Results are based on relationships in SemMedDB, a database of structured knowledge extracted from all MEDLINE citations (titles and abstracts) using SemRep. The core of our methodology is to construct two potential drug-drug interaction schemas, based on relationships extracted from SemMedDB. In the first schema, Drug1 and Drug2 interact through Drug1s effect on some gene, which in turn affects Drug2. In the second, Drug1 affects Gene1, while Drug2 affects Gene2. Gene1 and Gene2, together, then have an effect on some biological function. After checking each drug pair from the medication lists of each of 22 patients, we found 19 known and 62 unknown drug-drug interactions using both schemas. For example, our results suggest that the interaction of Lisinopril, an ACE inhibitor commonly prescribed for hypertension, and the antidepressant sertraline can potentially increase the likelihood and possibly the severity of psoriasis. We also assessed the relationships extracted by SemRep from a linguistic perspective and found that the precision of SemRep was 0.58 for 300 randomly selected sentences from MEDLINE. Our study demonstrates that the use of structured knowledge in the form of relationships from the biomedical literature can support the discovery of potential drug-drug interactions occurring in patient clinical data. Moreover, SemMedDB provides a good knowledge resource for expanding the range of drugs, genes, and biological functions considered as elements in various drug-drug interaction pathways.
world congress on medical and health informatics, medinfo | 2010
Qing Zeng-Treitler; Hyeoneui Kim; Graciela Rosemblat; Alla Keselman
With the development of electronic personal health records, more patients are gaining access to their own medical records. However, comprehension of medical record content remains difficult for many patients. Because each record is unique, it is also prohibitively costly to employ human translators to solve this problem. In this study, we investigated whether multilingual machine translation could help make medical record content more comprehensible to patients who lack proficiency in the language of the records. We used a popular general-purpose machine translation tool called Babel Fish to translate 213 medical record sentences from English into Spanish, Chinese, Russian and Korean. We evaluated the comprehensibility and accuracy of the translation. The text characteristics of the incorrectly translated sentences were also analyzed. In each language, the majority of the translations were incomprehensible (76% to 92%) and/or incorrect (77% to 89%). The main causes of the translation are vocabulary difficulty and syntactical complexity. A general-purpose machine translation tool like the Babel Fish is not adequate for the translation of medical records; however, a machine translation tool can potentially be improved significantly, if it is trained to target certain narrow domains in medicine.
Journal of the Association for Information Science and Technology | 2010
Alla Keselman; Graciela Rosemblat; Halil Kilicoglu; Marcelo Fiszman; Honglan Jin; Dongwook Shin; Thomas C. Rindflesch
A huge number of informal messages are posted every day in social network sites, blogs, and discussion forums. Emotions seem to be frequently important in these texts for expressing friendship, showing social support or as part of online arguments. Algorithms to identify sentiment and sentiment strength are needed to help understand the role of emotion in this informal communication and also to identify inappropriate or anomalous affective utterances, potentially associated with threatening behavior to the self or others. Nevertheless, existing sentiment detection algorithms tend to be commercially oriented, designed to identify opinions about products rather than user behaviors. This article partly fills this gap with a new algorithm, SentiStrength, to extract sentiment strength from informal English text, using new methods to exploit the de facto grammars and spelling styles of cyberspace. Applied to MySpace comments and with a lookup table of term sentiment strengths optimized by machine learning, SentiStrength is able to predict positive emotion with 60.6p accuracy and negative emotion with 72.8p accuracy, both based upon strength scales of 1–5. The former, but not the latter, is better than baseline and a wide range of general machine learning approaches.
Proceedings of the Second Workshop on Extra-Propositional Aspects of Meaning in Computational Semantics (ExProM 2015) | 2015
Halil Kilicoglu; Graciela Rosemblat; Michael J. Cairelli; Thomas C. Rindflesch
We propose a compositional method to assess the factuality of biomedical events extracted from the literature. The composition procedure relies on the notion of semantic embedding and a fine-grained classification of extrapropositional phenomena, including modality and valence shifting, and a dictionary based on this classification. The event factuality is computed as a product of the extra-propositional operators that have scope over the event. We evaluate our approach on the GENIA event corpus enriched with certainty level and polarity annotations. The results indicate that our approach is effective in identifying the certainty level component of factuality and is less successful in recognizing the other element, negative polarity.
Journal of Biomedical Informatics | 2013
Graciela Rosemblat; Dongwook Shin; Halil Kilicoglu; Charles Sneiderman; Thomas C. Rindflesch
We describe a domain-independent methodology to extend SemRep coverage beyond the biomedical domain. SemRep, a natural language processing application originally designed for biomedical texts, uses the knowledge sources provided by the Unified Medical Language System (UMLS©). Ontological and terminological extensions to the system are needed in order to support other areas of knowledge. We extended SemReps application by developing a semantic representation of a previously unsupported domain. This was achieved by adapting well-known ontology engineering phases and integrating them with the UMLS knowledge sources on which SemRep crucially depends. While the process to extend SemRep coverage has been successfully applied in earlier projects, this paper presents in detail the step-wise approach we followed and the mechanisms implemented. A case study in the field of medical informatics illustrates how the ontology engineering phases have been adapted for optimal integration with the UMLS. We provide qualitative and quantitative results, which indicate the validity and usefulness of our methodology.