Matthew S. Simpson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew S. Simpson is active.

Explore More

Publication

Featured researches published by Matthew S. Simpson.

Mining Text Data | 2012

Biomedical Text Mining: A Survey of Recent Progress

Matthew S. Simpson; Dina Demner-Fushman

The biomedical community makes extensive use of text mining technology. In the past several years, enormous progress has been made in developing tools and methods, and the community has been witness to some exciting developments. Although the state of the community is regularly reviewed, the sheer volume of work related to biomedical text mining and the rapid pace in which progress continues to be made make this a worthwhile, if not necessary, endeavor. This chapter provides a brief overview of the current state of text mining in the biomedical domain. Emphasis is placed on the resources and tools available to biomedical researchers and practitioners, as well as the major text mining tasks of interest to the community. These tasks include the recognition of explicit facts from biomedical literature, the discovery of previously unknown or implicit facts, document summarization, and question answering. For each topic, its basic challenges and methods are outlined and recent and influential work is reviewed.

Journal of computing science and engineering | 2012

Design and Development of a Multimodal Biomedical Information Retrieval System

Dina Demner-Fushman; Sameer K. Antani; Matthew S. Simpson; George R. Thoma

The search for relevant and actionable information is a key to achieving clinical and research goals in biomedicine. Biomedical information exists in different forms: as text and illustrations in journal articles and other documents, in images stored in databases, and as patients’ cases in electronic health records. This paper presents ways to move beyond conventional text-based searching of these resources, by combining text and visual features in search queries and document representation. A combination of techniques and tools from the fields of natural language processing, information retrieval, and content-based image retrieval allows the development of building blocks for advanced information services. Such services enable searching by textual as well as visual queries, and retrieving documents enriched by relevant images, charts, and other illustrations from the journal literature, patient records and image databases.

International Journal of Medical Informatics | 2009

Annotation and retrieval of clinically relevant images

Dina Demner-Fushman; Sameer K. Antani; Matthew S. Simpson; George R. Thoma

PURPOSE Medical images are a significant information source for clinical decision-making. Currently available information retrieval and decision support systems rely primarily on the text of scientific publications to find evidence in support of clinical information needs. The images and illustrations are available only within the full text of a scientific publication and do not directly contribute evidence to such systems. Our first goal is to explore whether image features facilitate finding relevant images that appear in publications. Our second goal is to find promising approaches for providing clinical evidence at the point of service, leveraging information contained in the text and images. METHODS We studied two approaches to finding illustrative evidence: a supervised machine-learning approach, in which images are classified as being relevant to an information need or not, and a pipeline information retrieval approach, in which images were retrieved using associated text and then re-ranked using content-based image retrieval (CBIR) techniques. RESULTS Our information retrieval approach did not benefit from combining textual and image information. However, given sufficient training data for the machine-learning approach, we achieved 56% average precision at 94% recall using textual features, and 27% average precision at 86% recall using image features. Combining these classifiers resulted in improvement up to 81% precision at 96% recall (74% recall at 85% precision, on average) for the requests with over 180 positive training examples. CONCLUSIONS Our supervised machine-learning methods that combine information from image and text are capable of achieving image annotation and retrieval accuracy acceptable for providing clinical evidence, given sufficient training data.

Information Retrieval | 2016

State-of-the-art in biomedical literature retrieval for clinical cases: a survey of the TREC 2014 CDS track

Kirk Roberts; Matthew S. Simpson; Dina Demner-Fushman; Ellen M. Voorhees; William R. Hersh

AbstractProviding access to relevant biomedical literature in a clinical setting has the potential to bridge a critical gap in evidence-based medicine. Here, our goal is specifically to provide relevant articles to clinicians to improve their decision-making in diagnosing, treating, and testing patients. To this end, the TREC 2014 Clinical Decision Support Track evaluated a system’s ability to retrieve relevant articles in one of three categories (Diagnosis, Treatment, Test) using an idealized form of a patient medical record . Over 100 submissions from over 25 participants were evaluated on 30 topics, resulting in over 37k relevance judgments. In this article, we provide an overview of the task, a survey of the information retrieval methods employed by the participants, an analysis of the results, and a discussion on the future directions for this challenging yet important task.

Information Retrieval | 2014

Multimodal biomedical image indexing and retrieval using descriptive text and global feature mapping

Matthew S. Simpson; Dina Demner-Fushman; Sameer K. Antani; George R. Thoma

The images found within biomedical articles are sources of essential information useful for a variety of tasks. Due to the rapid growth of biomedical knowledge, image retrieval systems are increasingly becoming necessary tools for quickly accessing the most relevant images from the literature for a given information need. Unfortunately, article text can be a poor substitute for image content, limiting the effectiveness of existing text-based retrieval methods. Additionally, the use of visual similarity by content-based retrieval methods as the sole indicator of image relevance is problematic since the importance of an image can depend on its context rather than its appearance. For biomedical image retrieval, multimodal approaches are often desirable. We describe in this work a practical multimodal solution for indexing and retrieving the images contained in biomedical articles. Recognizing the importance of text in determining image relevance, our method combines a predominately text-based image representation with a limited amount of visual information, in the form of quantized content-based visual features, through a process called global feature mapping. The resulting multimodal image surrogates are easily indexed and searched using existing text-based retrieval systems. Our experimental results demonstrate that our multimodal strategy significantly improves upon the retrieval accuracy of existing approaches. In addition, unlike many retrieval methods that utilize content-based visual features, the response time of our approach is negligible, making it suitable for use with large collections.

Computerized Medical Imaging and Graphics | 2015

Literature-based biomedical image classification and retrieval

Matthew S. Simpson; Daekeun You; Md. Mahmudur Rahman; Zhiyun Xue; Dina Demner-Fushman; Sameer K. Antani; George R. Thoma

Literature-based image informatics techniques are essential for managing the rapidly increasing volume of information in the biomedical domain. Compound figure separation, modality classification, and image retrieval are three related tasks useful for enabling efficient access to the most relevant images contained in the literature. In this article, we describe approaches to these tasks and the evaluation of our methods as part of the 2013 medical track of ImageCLEF. In performing each of these tasks, the textual and visual features used to represent images are an important consideration often left unaddressed. Therefore, we also describe a gradient-based optimization strategy for determining meaningful combinations of features and apply the method to the image retrieval task. An evaluation of our optimization strategy indicates the method is capable of producing statistically significant improvements in retrieval performance. Furthermore, the results of the 2013 ImageCLEF evaluation demonstrate the effectiveness of our techniques. In particular, our text-based and mixed image retrieval methods ranked first among all the participating groups.

International Journal of Multimedia Information Retrieval | 2014

Interactive cross and multimodal biomedical image retrieval based on automatic region-of-interest (ROI) identification and classification

Md. Mahmudur Rahman; Daekeun You; Matthew S. Simpson; Sameer K. Antani; Dina Demner-Fushman; George R. Thoma

In biomedical articles, authors often use annotation markers such as arrows, letters, or symbols overlaid on figures and illustrations to highlight ROIs. These annotations are then referenced and correlated with concepts in the caption text or figure citations in the article text. This association creates a bridge between the visual characteristics of important regions within an image and their semantic interpretation. Identifying these assists in extracting ROIs that are likely to be highly relevant to the discussion in the article text. The aim of this work is to perform semantic search without knowing the concept keyword or the specific name of the visual pattern or appearance. We consider the problem of cross and multimodal retrieval of images from articles which contains components of text and images. Our proposed method localizes and recognizes the annotations by utilizing a combination of rule-based and statistical image processing techniques. The image regions are then annotated for classification using biomedical concepts obtained from a glossary of imaging terms. Similar automatic ROI extraction can be applied to query images, or to interactively mark an ROI. As a result, visual characteristics of the ROIs can be mapped to text concepts and then used to search image captions. In addition, the system can toggle the search process from purely perceptual to a conceptual one (crossmodal) based on utilizing user feedback or integrate both perceptual and conceptual search in a multimodal search process. The hypothesis, that such approaches would improve biomedical image retrieval, was validated through experiments on a biomedical article dataset of thoracic CT scans.

Journal of Web Semantics | 2010

Invited paper: Interactive publication: The document as a research tool

George R. Thoma; Glenn Ford; Sameer K. Antani; Dina Demner-Fushman; Michael Chung; Matthew S. Simpson

The increasing prevalence of multimedia and research data generated by scientific work affords an opportunity to reformulate the idea of a scientific article from the traditional static document, or even one with links to supplemental material in remote databases, to a self-contained, multimedia-rich interactive publication. This paper describes our concept of such a document, and the design of tools for authoring (Forge) and visualization/analysis (Panorama). They are platform-independent applications written in Java, and developed in Eclipse using its Rich Client Platform (RCP) framework. Both applications operate on PDF files with links to XML files that define the media type, location, and action to be performed. We also briefly cite the challenges posed by the potentially large size of interactive publications, the need for evaluating their value to improved comprehension and learning, and the need for their long-term preservation by the National Library of Medicine and other libraries.

document recognition and retrieval | 2013

A robust pointer segmentation in biomedical images toward building a visual ontology for biomedical article retrieval

Daekeun You; Matthew S. Simpson; Sameer K. Antani; Dina Demner-Fushman; George R. Thoma

Pointers (arrows and symbols) are frequently used in biomedical images to highlight specific image regions of interest (ROIs) that are mentioned in figure captions and/or text discussion. Detection of pointers is the first step toward extracting relevant visual features from ROIs and combining them with textual descriptions for a multimodal (text and image) biomedical article retrieval system. Recently we developed a pointer recognition algorithm based on an edge-based pointer segmentation method, and subsequently reported improvements made on our initial approach involving the use of Active Shape Models (ASM) for pointer recognition and region growing-based method for pointer segmentation. These methods contributed to improving the recall of pointer recognition but not much to the precision. The method discussed in this article is our recent effort to improve the precision rate. Evaluation performed on two datasets and compared with other pointer segmentation methods show significantly improved precision and the highest F1 score.

meeting of the association for computational linguistics | 2009

Using Non-Lexical Features to Identify Effective Indexing Terms for Biomedical Illustrations

Matthew S. Simpson; Dina Demner-Fushman; Charles Sneiderman; Sameer K. Antani; George R. Thoma

Automatic image annotation is an attractive approach for enabling convenient access to images found in a variety of documents. Since image captions and relevant discussions found in the text can be useful for summarizing the content of images, it is also possible that this text can be used to generate salient indexing terms. Unfortunately, this problem is generally domain-specific because indexing terms that are useful in one domain can be ineffective in others. Thus, we present a supervised machine learning approach to image annotation utilizing non-lexical features extracted from image-related text to select useful terms. We apply this approach to several subdomains of the biomedical sciences and show that we are able to reduce the number of ineffective indexing terms.

Explore More