Dongwook Shin
National Institutes of Health
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dongwook Shin.
Bioinformatics | 2012
Halil Kilicoglu; Dongwook Shin; Marcelo Fiszman; Graciela Rosemblat; Thomas C. Rindflesch
SUMMARY Effective access to the vast biomedical knowledge present in the scientific literature is challenging. Semantic relations are increasingly used in knowledge management applications supporting biomedical research to help address this challenge. We describe SemMedDB, a repository of semantic predications (subject-predicate-object triples) extracted from the entire set of PubMed citations. We propose the repository as a knowledge resource that can assist in hypothesis generation and literature-based discovery in biomedicine as well as in clinical decision-making support. AVAILABILITY AND IMPLEMENTATION The SemMedDB repository is available as a MySQL database for non-commercial use at http://skr3.nlm.nih.gov/SemMedDB. An UMLS Metathesaurus license is required. CONTACT [email protected].
Information services & use | 2011
Thomas C. Rindflesch; Halil Kilicoglu; Marcelo Fiszman; Graciela Rosemblat; Dongwook Shin
To support more effective biomedical information management, Semantic MEDLINE integrates document retrieval, advanced natural language processing, automatic summarization and visualization into a single Web portal. The application is intended to help manage the results of PubMed searches by condensing core semantic content in the citations retrieved. Output is presented as a connected graph of semantic relations, with links to the original MEDLINE citations. The ability to connect salient information across documents helps users keep up with the research literature and discover connections which might otherwise go unnoticed. Semantic MEDLINE can make an impact on biomedicine by supporting scientific discovery and the timely translation of insights from basic research into advances in clinical practice and patient care. Semantic MEDLINE is illustrated here with recent research on the clock genes.
Sleep | 2012
Christopher M. Miller; Thomas C. Rindflesch; Marcelo Fiszman; Dimitar Hristovski; Dongwook Shin; Graciela Rosemblat; Han Zhang; Kingman P. Strohl
STUDY OBJECTIVES Sleep quality commonly diminishes with age, and, further, aging men often exhibit a wider range of sleep pathologies than women. We used a freely available, web-based discovery technique (Semantic MEDLINE) supported by semantic relationships to automatically extract information from MEDLINE titles and abstracts. DESIGN We assumed that testosterone is associated with sleep (the A-C relationship in the paradigm) and looked for a mechanism to explain this association (B explanatory link) as a potential or partial mechanism underpinning the etiology of eroded sleep quality in aging men. MEASUREMENTS AND RESULTS Review of full-text papers in critical nodes discovered in this manner resulted in the proposal that testosterone enhances sleep by inhibiting cortisol. Using this discovery method, we posit, and could confirm as a novel hypothesis, cortisol as part of a mechanistic link elucidating the observed correlation between decreased testosterone in aging men and diminished sleep quality. CONCLUSIONS This approach is publically available and useful not only in this manner but also to generate from the literature alternative explanatory models for observed experimental results.
Journal of Biomedical Informatics | 2011
Han Zhang; Marcelo Fiszman; Dongwook Shin; Christopher M. Miller; Graciela Rosemblat; Thomas C. Rindflesch
Automatic summarization has been proposed to help manage the results of biomedical information retrieval systems. Semantic MEDLINE, for example, summarizes semantic predications representing assertions in MEDLINE citations. Results are presented as a graph which maintains links to the original citations. Graphs summarizing more than 500 citations are hard to read and navigate, however. We exploit graph theory for focusing these large graphs. The method is based on degree centrality, which measures connectedness in a graph. Four categories of clinical concepts related to treatment of disease were identified and presented as a summary of input text. A baseline was created using term frequency of occurrence. The system was evaluated on summaries for treatment of five diseases compared to a reference standard produced manually by two physicians. The results showed that recall for system results was 72%, precision was 73%, and F-score was 0.72. The system F-score was considerably higher than that for the baseline (0.47).
conference on information and knowledge management | 1999
Hyunchul Jang; Young Il Kim; Dongwook Shin
Indexing and retrieval of structured documents have been drawing attention increasingly since they enable to retrieve and access a certain part of a document easily. So far, several methods have been proposed in the setting that documents are rarely changed. These can be applied for the books or journals possessed in libraries, but hardly work for the documents that are subject to change frequently in the business domain. This paper aims at enabling incremental update of indices whenever parts of documents are changed. For this, it employs the index-organized table that has been developed for the full-text retrieval in Oracle. It creates several index-organized tables that are essential in implementing the Bottom Up Scheme strategy, which has been developed for manipulating structured documents efficiently. Along with an experiment, the technique presented here does not add much index overhead to the original one taken to the index organized table. In addition, the updates of indices are performed quickly as soon as parts of documents are changed.
PLOS Computational Biology | 2014
Guocai Chen; Michael J. Cairelli; Halil Kilicoglu; Dongwook Shin; Thomas C. Rindflesch
Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology.
BMC Bioinformatics | 2013
Han Zhang; Marcelo Fiszman; Dongwook Shin; Bartłomiej Wilkowski; Thomas C. Rindflesch
BackgroundGraph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts).ResultsSemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings.ConclusionsFor 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively.
Knowledge and Information Systems | 2001
Dongwook Shin
Abstract. XML DTD (Document Type Declaration) puts two distinctive entities (attribute and element content) together into one framework for representing different document features. The notion of attribute in the XML DTD is similar to the field representation in the database, whereas the element content corresponds to the full text. In this paper, we view these two entities as different, each of which requires a different model for storage and retrieval. Attributes are stored in a database system, whereas the element contents and their indices are saved in files. We present a technique that puts together those two in an efficient way and builds an XML retrieval system on top of that. Such a system can achieve a reasonable trade-off between performance and cost in indexing and retrieval.
world congress on medical and health informatics, medinfo | 2010
Marcelo Fiszman; Bruce E. Bray; Dongwook Shin; Halil Kilicoglu; Glen C. Bennett; Olivier Bodenreider; Thomas C. Rindflesch
Clinical practice guidelines are used to disseminate best practice to clinicians. Successful guidelines depend on literature that is both relevant to the questions posed and based on high quality research in accordance with evidence-based medicine. Meeting these standards requires extensive manual review. We describe a system that combines symbolic semantic processing with a statistical method for selecting both relevant and high quality studies. We focused on a cardiovascular risk factor guideline, and the overall performance of the system was 56% recall, 91% precision (F0.5-score 0.81). If quality of the evidence is not taken into account, performance drops to 62% recall, 79% precision (F0.5-score 0.75). We suggest that this system can potentially improve the efficiency of the literature review process in guideline development.
Journal of the Association for Information Science and Technology | 2010
Alla Keselman; Graciela Rosemblat; Halil Kilicoglu; Marcelo Fiszman; Honglan Jin; Dongwook Shin; Thomas C. Rindflesch
A huge number of informal messages are posted every day in social network sites, blogs, and discussion forums. Emotions seem to be frequently important in these texts for expressing friendship, showing social support or as part of online arguments. Algorithms to identify sentiment and sentiment strength are needed to help understand the role of emotion in this informal communication and also to identify inappropriate or anomalous affective utterances, potentially associated with threatening behavior to the self or others. Nevertheless, existing sentiment detection algorithms tend to be commercially oriented, designed to identify opinions about products rather than user behaviors. This article partly fills this gap with a new algorithm, SentiStrength, to extract sentiment strength from informal English text, using new methods to exploit the de facto grammars and spelling styles of cyberspace. Applied to MySpace comments and with a lookup table of term sentiment strengths optimized by machine learning, SentiStrength is able to predict positive emotion with 60.6p accuracy and negative emotion with 72.8p accuracy, both based upon strength scales of 1–5. The former, but not the latter, is better than baseline and a wide range of general machine learning approaches.