Meliha Yetisgen-Yildiz
University of Washington
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Meliha Yetisgen-Yildiz.
international conference on knowledge capture | 2003
Wanda Pratt; Meliha Yetisgen-Yildiz
The explosive growth in the biomedical literature has made it difficult for researchers to keep up with advancements, even in their own narrow specializations. In addition, this current volume of information has created barriers that prevent researchers from exploring connections to their own work from other parts of the literature. Although potentially useful connections might permeate the literature, they will remain buried without new kinds of tools to help researchers capture new knowledge that bridges gaps across distinct sections of the literature. In this paper, we present LitLinker, a system that incorporates knowledge-based methodologies, natural-language processing techniques, and a data-mining algorithm to mine the biomedical literature for new, potential causal links between biomedical terms. Our results from a well-known text-mining example show that LitLinker can capture these novel, interesting connections in an open-ended fashion, with less manual intervention than in previous systems.
Journal of Biomedical Informatics | 2009
Meliha Yetisgen-Yildiz; Wanda Pratt
While medical researchers formulate new hypotheses to test, they need to identify connections to their work from other parts of the medical literature. However, the current volume of information has become a great barrier for this task. Recently, many literature-based discovery (LBD) systems have been developed to help researchers identify new knowledge that bridges gaps across distinct sections of the medical literature. Each LBD system uses different methods for mining the connections from text and ranking the identified connections, but none of the currently available LBD evaluation approaches can be used to compare the effectiveness of these methods. In this paper, we present an evaluation methodology for LBD systems that allows comparisons across different systems. We demonstrate the abilities of our evaluation methodology by using it to compare the performance of different correlation-mining and ranking approaches used by existing LBD systems. This evaluation methodology should help other researchers compare approaches, make informed algorithm choices, and ultimately help to improve the performance of LBD systems overall.
Journal of Biomedical Informatics | 2013
Meliha Yetisgen-Yildiz; Martin L. Gunn; Fei Xia; Thomas H. Payne
Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. The absence of an automated system to identify and track radiology recommendations is an important barrier to ensuring timely follow-up of patients especially with non-acute incidental findings on imaging examinations. In this paper, we present a text processing pipeline to automatically identify clinically important recommendation sentences in radiology reports. Our extraction pipeline is based on natural language processing (NLP) and supervised text classification methods. To develop and test the pipeline, we created a corpus of 800 radiology reports double annotated for recommendation sentences by a radiologist and an internist. We ran several experiments to measure the impact of different feature types and the data imbalance between positive and negative recommendation sentences. Our fully statistical approach achieved the best f-score 0.758 in identifying the critical recommendation sentences in radiology reports.
Journal of the American Medical Informatics Association | 2012
Cosmin Adrian Bejan; Fei Xia; Lucy Vanderwende; Mark M. Wurfel; Meliha Yetisgen-Yildiz
OBJECTIVE This paper describes a natural language processing system for the task of pneumonia identification. Based on the information extracted from the narrative reports associated with a patient, the task is to identify whether or not the patient is positive for pneumonia. DESIGN A binary classifier was employed to identify pneumonia from a dataset of multiple types of clinical notes created for 426 patients during their stay in the intensive care unit. For this purpose, three types of features were considered: (1) word n-grams, (2) Unified Medical Language System (UMLS) concepts, and (3) assertion values associated with pneumonia expressions. System performance was greatly increased by a feature selection approach which uses statistical significance testing to rank features based on their association with the two categories of pneumonia identification. RESULTS Besides testing our system on the entire cohort of 426 patients (unrestricted dataset), we also used a smaller subset of 236 patients (restricted dataset). The performance of the system was compared with the results of a baseline previously proposed for these two datasets. The best results achieved by the system (85.71 and 81.67 F1-measure) are significantly better than the baseline results (50.70 and 49.10 F1-measure) on the restricted and unrestricted datasets, respectively. CONCLUSION Using a statistical feature selection approach that allows the feature extractor to consider only the most informative features from the feature space significantly improves the performance over a baseline that uses all the features from the same feature space. Extracting the assertion value for pneumonia expressions further improves the system performance.
Journal of Biomedical Informatics | 2013
Cosmin Adrian Bejan; Lucy Vanderwende; Fei Xia; Meliha Yetisgen-Yildiz
This paper describes an approach to assertion classification and an empirical study on the impact this task has on phenotype identification, a real world application in the clinical domain. The task of assertion classification is to assign to each medical concept mentioned in a clinical report (e.g., pneumonia, chest pain) a specific assertion category (e.g., present, absent, and possible). To improve the classification of medical assertions, we propose several new features that capture the semantic properties of special cue words highly indicative of a specific assertion category. The results obtained outperform the current state-of-the-art results for this task. Furthermore, we confirm the intuition that assertion classification contributes in significantly improving the results of phenotype identification from free-text clinical records.
Archive | 2008
Meliha Yetisgen-Yildiz; Wanda Pratt
Evaluating discovery systems is a fundamentally challenging task because if they are successful, by definition they are capturing new knowledge that has yet to be proven useful. To overcome this difficulty, many researchers in literature-based discovery (LBD) replicated Swansons discoveries to evaluate the performance of their systems. They reported overall success if one of the discoveries generated by their system was the same as Swansons discovery. This type of evaluation is powerful yet incomplete because it does not inform us about the quality of the rest of the discoveries identified by the system nor does it test the generalizability of the results. Recently, alternative evaluation methods have been designed to provide more information on the overall performance of the systems. The purpose of this chapter is to review and analyze the current evaluation methods for LBD systems and to discuss potential ways to use these evaluation methods for comparing performance of different systems, rather than reporting the performance of only one system. We will also summarize the current approaches used to evaluate the graphical user interfaces of LBD systems.
eGEMs (Generating Evidence & Methods to improve patient outcomes) | 2013
Emily Beth Devine; Daniel Capurro; Erik G. Van Eaton; Rafael Alfonso-Cristancho; Allison Devlin; N. David Yanez; Meliha Yetisgen-Yildiz; David R. Flum; Peter Tarczy-Hornoch
Background: The field of clinical research informatics includes creation of clinical data repositories (CDRs) used to conduct quality improvement (QI) activities and comparative effectiveness research (CER). Ideally, CDR data are accurately and directly abstracted from disparate electronic health records (EHRs), across diverse health-systems. Objective: Investigators from Washington State’s Surgical Care Outcomes and Assessment Program (SCOAP) Comparative Effectiveness Research Translation Network (CERTAIN) are creating such a CDR. This manuscript describes the automation and validation methods used to create this digital infrastructure. Methods: SCOAP is a QI benchmarking initiative. Data are manually abstracted from EHRs and entered into a data management system. CERTAIN investigators are now deploying Caradigm’s Amalga™ tool to facilitate automated abstraction of data from multiple, disparate EHRs. Concordance is calculated to compare data automatically to manually abstracted. Performance measures are calculated between Amalga and each parent EHR. Validation takes place in repeated loops, with improvements made over time. When automated abstraction reaches the current benchmark for abstraction accuracy - 95% - itwill ‘go-live’ at each site. Progress to Date: A technical analysis was completed at 14 sites. Five sites are contributing; the remaining sites prioritized meeting Meaningful Use criteria. Participating sites are contributing 15–18 unique data feeds, totaling 13 surgical registry use cases. Common feeds are registration, laboratory, transcription/dictation, radiology, and medications. Approximately 50% of 1,320 designated data elements are being automatically abstracted—25% from structured data; 25% from text mining. Conclusion: In semi-automating data abstraction and conducting a rigorous validation, CERTAIN investigators will semi-automate data collection to conduct QI and CER, while advancing the Learning Healthcare System.
conference on computer supported cooperative work | 2012
Jina Huh; Andrea Hartzler; Sean A. Munson; Nicholas R. Anderson; Kelly Edwards; John L. Gore; David W. McDonald; Jim O'Leary; Andrea A. Parker; Derek Streat; Meliha Yetisgen-Yildiz; Mark S. Ackerman; Wanda Pratt
Researchers and practitioners show increasing interest in utilizing patient-generated information on the Web. Although the HCI and CSCW communities have provided many exciting opportunities for exploring new ideas and building broad agenda in health, few venues offer a platform for interdisciplinary and collaborative brainstorming about design challenges and opportunities in this space. The goal of this workshop is to provide participants with opportunities to interact with stakeholders from diverse backgrounds and practices - researchers, practitioners, designers, programmers, and ethnographers - and together generate tangible design outcomes that utilize patient-generated information on the Web. Through small multidisciplinary group work, we will provide participants with new collaboration opportunities, understanding of the state of the art, inspiration for future work, and ideally avenues for continuing to develop research and design ideas generated at the workshop.
computational intelligence methods for bioinformatics and biostatistics | 2014
Anna Korhonen; Yufan Guo; Simon Baker; Meliha Yetisgen-Yildiz; Ulla Stenius; Masashi Narita; Pietro Liò
Automated Literature Based Discovery (LBD) generates new knowledge by combining what is already known in literature. Facilitating large-scale hypothesis testing and generation from huge collections of literature, LBD could significantly support research in biomedical sciences. However, the uptake of LBD by the scientific community has been limited. One of the key reasons for this is the limited nature of existing LBD methodology. Based on fairly shallow methods, current LBD captures only some of the information available in literature. We discuss how advanced Text Mining based on Information retrieval, Natural Language Processing and data mining could open the doors to much deeper, wider coverage and dynamic LBD better capable of evolving with science, in particular when combined with sophisticated, state-of-the-art knowledge discovery techniques.
international conference on knowledge capture | 2007
Meliha Yetisgen-Yildiz; Wanda Pratt
In this paper, we propose a new method to extract the meaning of medical concept correlations from MEDLINE abstract sentences. Our method incorporates a medical knowledge base, natural language processing approaches, and text classification methods. We describe how we automatically created the training sets and report the results of our initial experiments.