Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alan R. Aronson is active.

Publication


Featured researches published by Alan R. Aronson.


Journal of the American Medical Informatics Association | 2010

An overview of MetaMap: historical perspective and recent advances.

Alan R. Aronson; François-Michel Lang

MetaMap is a widely available program providing access to the concepts in the unified medical language system (UMLS) Metathesaurus from biomedical text. This study reports on MetaMaps evolution over more than a decade, concentrating on those features arising out of the research needs of the biomedical informatics community both within and outside of the National Library of Medicine. Such features include the detection of author-defined acronyms/abbreviations, the ability to browse the Metathesaurus for concepts even tenuously related to input text, the detection of negation in situations in which the polarity of predications is important, word sense disambiguation (WSD), and various technical and algorithmic features. Near-term plans for MetaMap development include the incorporation of chemical name recognition and enhanced WSD.


Journal of the American Medical Informatics Association | 2003

Generating Hypotheses by Discovering Implicit Associations in the Literature: A Case Report of a Search for New Potential Therapeutic Uses for Thalidomide

Marc Weeber; Rein Vos; Henny Klein; Lolkje T. W. de Jong-van den Berg; Alan R. Aronson; Grietje Molema

The availability of scientific bibliographies through online databases provides a rich source of information for scientists to support their research. However, the risk of this pervasive availability is that an individual researcher may fail to find relevant information that is outside the direct scope of interest. Following Swansons ABC model of disjoint but complementary structures in the biomedical literature, we have developed a discovery support tool to systematically analyze the scientific literature in order to generate novel and plausible hypotheses. In this case report, we employ the system to find potentially new target diseases for the drug thalidomide. We find solid bibliographic evidence suggesting that thalidomide might be useful for treating acute pancreatitis, chronic hepatitis C, Helicobacter pylori-induced gastritis, and myasthenia gravis. However, experimental and clinical evaluation is needed to validate these hypotheses and to assess the trade-off between therapeutic benefits and toxicities.


Journal of Biomedical Informatics | 2003

Towards linking patients and clinical information: detecting UMLS concepts in e-mail

Patricia Flatley Brennan; Alan R. Aronson

The purpose of this project is to explore the feasibility of detecting terms within the electronic messages of patients that could be used to effectively search electronic knowledge resources and bring health information resources into the hands of patients. Our team is exploring the application of the natural language processing (NLP) tools built within the Lister Hill Center at the National Library of Medicine (NLM) to the challenge of detecting relevant concepts from the Unified Medical Language System (UMLS) within the free text of lay peoples electronic messages (e-mail). We obtained a sample of electronic messages sent by patients participating in a randomized field evaluation of an internet-based home care support service to the project nurse, and we subjected elements of these messages to a series of analyses using several vocabularies from the UMLS Metathesaurus and the selected NLP tools. The nursing vocabularies provide an excellent starting point for this exercise because their domain encompasses patients responses to health challenges. In successive runs we augmented six nursing vocabularies (NANDA Nursing Diagnosis, Nursing Interventions Classification, Nursing Outcomes Classification, Home Health Classification, Omaha System, and the Patient Care Data Set) with selected sets of clinical terminologies (International Classification of Primary Care; International Classification of Primary Care- American English; Micromedex DRUGDEX; National Drug Data File; Thesaurus of Psychological Terms; WHO Adverse Drug Reaction Terminology) and then additionally with either Medical Subject Heading (MeSH) or SNOMED International terms. The best performance was obtained when the nursing vocabularies were complemented with selected clinical terminologies. These findings have implications not only for facilitating lay peoples access to electronic knowledge resources but may also be of assistance in developing new tools to aid in linking free text (e.g., clinical notes) to lexically complex knowledge resources such as those emerging from the Human Genome Project.


BMC Bioinformatics | 2011

Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation

Antonio Jimeno-Yepes; Bridget T. McInnes; Alan R. Aronson

BackgroundEvaluation of Word Sense Disambiguation (WSD) methods in the biomedical domain is difficult because the available resources are either too small or too focused on specific types of entities (e.g. diseases or genes). We present a method that can be used to automatically develop a WSD test collection using the Unified Medical Language System (UMLS) Metathesaurus and the manual MeSH indexing of MEDLINE. We demonstrate the use of this method by developing such a data set, called MSH WSD.MethodsIn our method, the Metathesaurus is first screened to identify ambiguous terms whose possible senses consist of two or more MeSH headings. We then use each ambiguous term and its corresponding MeSH heading to extract MEDLINE citations where the term and only one of the MeSH headings co-occur. The term found in the MEDLINE citation is automatically assigned the UMLS CUI linked to the MeSH heading. Each instance has been assigned a UMLS Concept Unique Identifier (CUI). We compare the characteristics of the MSH WSD data set to the previously existing NLM WSD data set.ResultsThe resulting MSH WSD data set consists of 106 ambiguous abbreviations, 88 ambiguous terms and 9 which are a combination of both, for a total of 203 ambiguous entities. For each ambiguous term/abbreviation, the data set contains a maximum of 100 instances per sense obtained from MEDLINE.We evaluated the reliability of the MSH WSD data set using existing knowledge-based methods and compared their performance to that of the results previously obtained by these algorithms on the pre-existing data set, NLM WSD. We show that the knowledge-based methods achieve different results but keep their relative performance except for the Journal Descriptor Indexing (JDI) method, whose performance is below the other methods.ConclusionsThe MSH WSD data set allows the evaluation of WSD algorithms in the biomedical domain. Compared to previously existing data sets, MSH WSD contains a larger number of biomedical terms/abbreviations and covers the largest set of UMLS Semantic Types. Furthermore, the MSH WSD data set has been generated automatically reusing already existing annotations and, therefore, can be regenerated from subsequent UMLS versions.


meeting of the association for computational linguistics | 2007

From indexing the biomedical literature to coding clinical text: experience with MTI and machine learning approaches

Alan R. Aronson; Olivier Bodenreider; Dina Demner-Fushman; Kin Wah Fung; Vivian K. Lee; James G. Mork; Aurélie Névéol; Lee B. Peters; Willie J. Rogers

This paper describes the application of an ensemble of indexing and classification systems, which have been shown to be successful in information retrieval and classification of medical literature, to a new task of assigning ICD-9-CM codes to the clinical history and impression sections of radiology reports. The basic methods used are: a modification of the NLM Medical Text Indexer system, SVM, k-NN and a simple pattern-matching method. The basic methods are combined using a variant of stacking. Evaluated in the context of a Medical NLP Challenge, fusion produced an F-score of 0.85 on the Challenge test set, which is considerably above the mean Challenge F-score of 0.77 for 44 participating groups.


BMC Bioinformatics | 2010

Knowledge-based biomedical word sense disambiguation: comparison of approaches

Antonio Jimeno-Yepes; Alan R. Aronson

BackgroundWord sense disambiguation (WSD) algorithms attempt to select the proper sense of ambiguous terms in text. Resources like the UMLS provide a reference thesaurus to be used to annotate the biomedical literature. Statistical learning approaches have produced good results, but the size of the UMLS makes the production of training data infeasible to cover all the domain.MethodsWe present research on existing WSD approaches based on knowledge bases, which complement the studies performed on statistical learning. We compare four approaches which rely on the UMLS Metathesaurus as the source of knowledge. The first approach compares the overlap of the context of the ambiguous word to the candidate senses based on a representation built out of the definitions, synonyms and related terms. The second approach collects training data for each of the candidate senses to perform WSD based on queries built using monosemous synonyms and related terms. These queries are used to retrieve MEDLINE citations. Then, a machine learning approach is trained on this corpus. The third approach is a graph-based method which exploits the structure of the Metathesaurus network of relations to perform unsupervised WSD. This approach ranks nodes in the graph according to their relative structural importance. The last approach uses the semantic types assigned to the concepts in the Metathesaurus to perform WSD. The context of the ambiguous word and semantic types of the candidate concepts are mapped to Journal Descriptors. These mappings are compared to decide among the candidate concepts. Results are provided estimating accuracy of the different methods on the WSD test collection available from the NLM.ConclusionsWe have found that the last approach achieves better results compared to the other methods. The graph-based approach, using the structure of the Metathesaurus network to estimate the relevance of the Metathesaurus concepts, does not perform well compared to the first two methods. In addition, the combination of methods improves the performance over the individual approaches. On the other hand, the performance is still below statistical learning trained on manually produced data and below the maximum frequency sense baseline. Finally, we propose several directions to improve the existing methods and to improve the Metathesaurus to be more effective in WSD.


Journal of Biomedical Informatics | 2009

A recent advance in the automatic indexing of the biomedical literature

Aurélie Névéol; Sonya E. Shooshan; Susanne M. Humphrey; James G. Mork; Alan R. Aronson

The volume of biomedical literature has experienced explosive growth in recent years. This is reflected in the corresponding increase in the size of MEDLINE, the largest bibliographic database of biomedical citations. Indexers at the US National Library of Medicine (NLM) need efficient tools to help them accommodate the ensuing workload. After reviewing issues in the automatic assignment of Medical Subject Headings (MeSH terms) to biomedical text, we focus more specifically on the new subheading attachment feature for NLMs Medical Text Indexer (MTI). Natural Language Processing, statistical, and machine learning methods of producing automatic MeSH main heading/subheading pair recommendations were assessed independently and combined. The best combination achieves 48% precision and 30% recall. After validation by NLM indexers, a suitable combination of the methods presented in this paper was integrated into MTI as a subheading attachment feature producing MeSH indexing recommendations compliant with current state-of-the-art indexing practice.


Journal of the American Medical Informatics Association | 2010

Extracting Rx information from clinical narrative

James G. Mork; Olivier Bodenreider; Dina Demner-Fushman; Rezarta Islamaj Doğan; François-Michel Lang; Zhiyong Lu; Aurélie Névéol; Lee B. Peters; Sonya E. Shooshan; Alan R. Aronson

OBJECTIVE The authors used the i2b2 Medication Extraction Challenge to evaluate their entity extraction methods, contribute to the generation of a publicly available collection of annotated clinical notes, and start developing methods for ontology-based reasoning using structured information generated from the unstructured clinical narrative. DESIGN Extraction of salient features of medication orders from the text of de-identified hospital discharge summaries was addressed with a knowledge-based approach using simple rules and lookup lists. The entity recognition tool, MetaMap, was combined with dose, frequency, and duration modules specifically developed for the Challenge as well as a prototype module for reason identification. MEASUREMENTS Evaluation metrics and corresponding results were provided by the Challenge organizers. RESULTS The results indicate that robust rule-based tools achieve satisfactory results in extraction of simple elements of medication orders, but more sophisticated methods are needed for identification of reasons for the orders and durations. LIMITATIONS Owing to the time constraints and nature of the Challenge, some obvious follow-on analysis has not been completed yet. CONCLUSIONS The authors plan to integrate the new modules with MetaMap to enhance its accuracy. This integration effort will provide guidance in retargeting existing tools for better processing of clinical text.


pacific symposium on biocomputing | 2006

Multiple approaches to fine-grained indexing of the biomedical literature.

Aurélie Névéol; Sonya E. Shooshan; Susanne M. Humphrey; Thomas C. Rindflesh; Alan R. Aronson

The number of articles in the MEDLINE database is expected to increase tremendously in the coming years. To ensure that all these documents are indexed with continuing high quality, it is necessary to develop tools and methods that help the indexers in their daily task. We present three methods addressing a novel aspect of automatic indexing of the biomedical literature, namely producing MeSH main heading/subheading pair recommendations. The methods, (dictionary-based, post- processing rules and Natural Language Processing rules) are described and evaluated on a genetics-related corpus. The best overall performance is obtained for the subheading genetics (70% precision and 17% recall with post-processing rules, 48% precision and 37% recall with the dictionary-based method). Future work will address extending this work to all MeSH subheadings and a more thorough study of method combination.


Journal of computing science and engineering | 2012

A One-Size-Fits-All Indexing Method Does Not Exist: Automatic Selection Based on Meta-Learning

Antonio Jimeno-Yepes; James G. Mork; Dina Demner-Fushman; Alan R. Aronson

We present a methodology that automatically selects indexing algorithms for each heading in Medical Subject Headings (MeSH), National Library of Medicine’s vocabulary for indexing MEDLINE. While manually comparing indexing methods is manageable with a limited number of MeSH headings, a large number of them make automation of this selection desirable. Results show that this process can be automated, based on previously indexed MEDLINE citations. We find that AdaBoostM1 is better suited to index a group of MeSH hedings named Check Tags, and helps improve the micro F-measure from 0.5385 to 0.7157, and the macro F-measure from 0.4123 to 0.5387 (both p < 0.01). Category: Convergence computing

Collaboration


Dive into the Alan R. Aronson's collaboration.

Top Co-Authors

Avatar

James G. Mork

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Dina Demner-Fushman

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Thomas C. Rindflesch

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Susanne M. Humphrey

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Aurélie Névéol

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Antonio Jimeno-Yepes

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Sonya E. Shooshan

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Antonio Jimeno Yepes

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Olivier Bodenreider

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Russell F. Loane

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge