Lyudmila Shagina
Columbia University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lyudmila Shagina.
Journal of the American Medical Informatics Association | 2004
Carol Friedman; Lyudmila Shagina; Yves A. Lussier; George Hripcsak
OBJECTIVE The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method. METHODS An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts. RESULTS Recall of the system for UMLS coding of all terms was .77 (95% CI.72-.81), and for coding terms that had corresponding UMLS codes recall was .83 (.79-.87). Recall of the system for extracting all terms was .84 (.81-.88). Recall of the experts ranged from .69 to .91 for extracting terms. The precision of the system was .89 (.87-.91), and precision of the experts ranged from .61 to .91. CONCLUSION Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.
Journal of the American Medical Informatics Association | 1999
Carol Friedman; George Hripcsak; Lyudmila Shagina; Hongfang Liu
OBJECTIVE To design a document model that provides reliable and efficient access to clinical information in patient reports for a broad range of clinical applications, and to implement an automated method using natural language processing that maps textual reports to a form consistent with the model. METHODS A document model that encodes structured clinical information in patient reports while retaining the original contents was designed using the extensible markup language (XML), and a document type definition (DTD) was created. An existing natural language processor (NLP) was modified to generate output consistent with the model. Two hundred reports were processed using the modified NLP system, and the XML output that was generated was validated using an XML validating parser. RESULTS The modified NLP system successfully processed all 200 reports. The output of one report was invalid, and 199 reports were valid XML forms consistent with the DTD. CONCLUSIONS Natural language processing can be used to automatically create an enriched document that contains a structured component whose elements are linked to portions of the original textual report. This integrated document model provides a representation where documents containing specific information can be accurately and efficiently retrieved by querying the structured components. If manual review of the documents is desired, the salient information in the original reports can also be identified and highlighted. Using an XML model of tagging provides an additional benefit in that software tools that manipulate XML documents are readily available.
Bioinformatics | 2006
Carol Friedman; Tara Borlawsky; Lyudmila Shagina; H. Rosie Xing; Yves A. Lussier
MOTIVATION Natural language processing (NLP) techniques are increasingly being used in biology to automate the capture of new biological discoveries in text, which are being reported at a rapid rate. Yet, information represented in NLP data structures is classically very different from information organized with ontologies as found in model organisms or genetic databases. To facilitate the computational reuse and integration of information buried in unstructured text with that of genetic databases, we propose and evaluate a translational schema that represents a comprehensive set of phenotypic and genetic entities, as well as their closely related biomedical entities and relations as expressed in natural language. In addition, the schema connects different scales of biological information, and provides mappings from the textual information to existing ontologies, which are essential in biology for integration, organization, dissemination and knowledge management of heterogeneous phenotypic information. A common comprehensive representation for otherwise heterogeneous phenotypic and genetic datasets, such as the one proposed, is critical for advancing systems biology because it enables acquisition and reuse of unprecedented volumes of diverse types of knowledge and information from text. RESULTS A novel representational schema, PGschema, was developed that enables translation of phenotypic, genetic and their closely related information found in textual narratives to a well-defined data structure comprising phenotypic and genetic concepts from established ontologies along with modifiers and relationships. Evaluation for coverage of a selected set of entities showed that 90% of the information could be represented (95% confidence interval: 86-93%; n = 268). Moreover, PGschema can be expressed automatically in an XML format using natural language techniques to process the text. To our knowledge, we are providing the first evaluation of a translational schema for NLP that contains declarative knowledge about genes and their associated biomedical data (e.g. phenotypes). AVAILABILITY http://zellig.cpmc.columbia.edu/PGschema
Journal of Biomedical Informatics | 2003
Carol Friedman; Hongfang Liu; Lyudmila Shagina
Medical terminologies are critical for automated healthcare systems. Some terminologies, such as the UMLS and SNOMED are comprehensive, whereas others specialize in limited domains (i.e., BIRADS) or are developed for specific applications. An important feature of a terminology is comprehensive coverage of relevant clinical terms and ease of use by users, which include computerized applications. We have developed a method for facilitating vocabulary development and maintenance that is based on utilization of natural language processing to mine large collections of clinical reports in order to obtain information on terminology as expressed by physicians. Once the reports are processed and the terms structured and collected into an XML representational schema, it is possible to determine information about terms, such as frequency of occurrence, compositionality, relations to other terms (such as modifiers), and correspondence to a controlled vocabulary. This paper describes the method and discusses how it can be used as a tool to help vocabulary builders navigate through the terms physicians use, visualize their relations to other terms via a flexible viewer, and determine their correspondence to a controlled vocabulary.
Journal of Biomedical Informatics | 2005
Eneida A. Mendonça; Janet P. Haas; Lyudmila Shagina; Elaine Larson; Carol Friedman
american medical informatics association annual symposium | 1999
Carol Friedman; Charles Knirsch; Lyudmila Shagina; George Hripcsak
american medical informatics association annual symposium | 2001
Carol Friedman; Hongfang Liu; Lyudmila Shagina; Stephen B. Johnson; George Hripcsak
american medical informatics association annual symposium | 2001
Yves A. Lussier; Lyudmila Shagina; Carol Friedman
american medical informatics association annual symposium | 2000
Yves A. Lussier; Lyudmila Shagina; Carol Friedman
american medical informatics association annual symposium | 1996
Carol Friedman; Lyudmila Shagina; Socrates A. Socratous; Xiao Zeng