James G. Mork | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where James G. Mork is active.

Explore More

Publication

Featured researches published by James G. Mork.

meeting of the association for computational linguistics | 2007

From indexing the biomedical literature to coding clinical text: experience with MTI and machine learning approaches

Alan R. Aronson; Olivier Bodenreider; Dina Demner-Fushman; Kin Wah Fung; Vivian K. Lee; James G. Mork; Aurélie Névéol; Lee B. Peters; Willie J. Rogers

This paper describes the application of an ensemble of indexing and classification systems, which have been shown to be successful in information retrieval and classification of medical literature, to a new task of assigning ICD-9-CM codes to the clinical history and impression sections of radiology reports. The basic methods used are: a modification of the NLM Medical Text Indexer system, SVM, k-NN and a simple pattern-matching method. The basic methods are combined using a variant of stacking. Evaluated in the context of a Medical NLP Challenge, fusion produced an F-score of 0.85 on the Challenge test set, which is considerably above the mean Challenge F-score of 0.77 for 44 participating groups.

Journal of Biomedical Informatics | 2009

A recent advance in the automatic indexing of the biomedical literature

Aurélie Névéol; Sonya E. Shooshan; Susanne M. Humphrey; James G. Mork; Alan R. Aronson

The volume of biomedical literature has experienced explosive growth in recent years. This is reflected in the corresponding increase in the size of MEDLINE, the largest bibliographic database of biomedical citations. Indexers at the US National Library of Medicine (NLM) need efficient tools to help them accommodate the ensuing workload. After reviewing issues in the automatic assignment of Medical Subject Headings (MeSH terms) to biomedical text, we focus more specifically on the new subheading attachment feature for NLMs Medical Text Indexer (MTI). Natural Language Processing, statistical, and machine learning methods of producing automatic MeSH main heading/subheading pair recommendations were assessed independently and combined. The best combination achieves 48% precision and 30% recall. After validation by NLM indexers, a suitable combination of the methods presented in this paper was integrated into MTI as a subheading attachment feature producing MeSH indexing recommendations compliant with current state-of-the-art indexing practice.

Journal of the American Medical Informatics Association | 2010

Extracting Rx information from clinical narrative

James G. Mork; Olivier Bodenreider; Dina Demner-Fushman; Rezarta Islamaj Doğan; François-Michel Lang; Zhiyong Lu; Aurélie Névéol; Lee B. Peters; Sonya E. Shooshan; Alan R. Aronson

OBJECTIVE The authors used the i2b2 Medication Extraction Challenge to evaluate their entity extraction methods, contribute to the generation of a publicly available collection of annotated clinical notes, and start developing methods for ontology-based reasoning using structured information generated from the unstructured clinical narrative. DESIGN Extraction of salient features of medication orders from the text of de-identified hospital discharge summaries was addressed with a knowledge-based approach using simple rules and lookup lists. The entity recognition tool, MetaMap, was combined with dose, frequency, and duration modules specifically developed for the Challenge as well as a prototype module for reason identification. MEASUREMENTS Evaluation metrics and corresponding results were provided by the Challenge organizers. RESULTS The results indicate that robust rule-based tools achieve satisfactory results in extraction of simple elements of medication orders, but more sophisticated methods are needed for identification of reasons for the orders and durations. LIMITATIONS Owing to the time constraints and nature of the Challenge, some obvious follow-on analysis has not been completed yet. CONCLUSIONS The authors plan to integrate the new modules with MetaMap to enhance its accuracy. This integration effort will provide guidance in retargeting existing tools for better processing of clinical text.

Journal of computing science and engineering | 2012

A One-Size-Fits-All Indexing Method Does Not Exist: Automatic Selection Based on Meta-Learning

Antonio Jimeno-Yepes; James G. Mork; Dina Demner-Fushman; Alan R. Aronson

We present a methodology that automatically selects indexing algorithms for each heading in Medical Subject Headings (MeSH), National Library of Medicine’s vocabulary for indexing MEDLINE. While manually comparing indexing methods is manageable with a limited number of MeSH headings, a large number of them make automation of this selection desirable. Results show that this process can be automated, based on previously indexed MEDLINE citations. We find that AdaBoostM1 is better suited to index a group of MeSH hedings named Check Tags, and helps improve the micro F-measure from 0.5385 to 0.7157, and the macro F-measure from 0.4123 to 0.5387 (both p < 0.01). Category: Convergence computing

BMC Bioinformatics | 2013

GeneRIF indexing: sentence selection based on machine learning

Antonio Jimeno-Yepes; J Caitlin Sticco; James G. Mork; Alan R. Aronson

BackgroundA Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support creating the GeneRIF entries. The creation of GeneRIF entries involves the identification of the genes mentioned in MEDLINE®; citations and the sentences describing a novel function.ResultsWe have compared several learning algorithms and several features extracted or derived from MEDLINE sentences to determine if a sentence should be selected for GeneRIF indexing. Features are derived from the sentences or using mechanisms to augment the information provided by them: assigning a discourse label using a previously trained model, for example. We show that machine learning approaches with specific feature combinations achieve results close to one of the annotators. We have evaluated different feature sets and learning algorithms. In particular, Naïve Bayes achieves better performance with a selection of features similar to one used in related work, which considers the location of the sentence, the discourse of the sentence and the functional terminology in it.ConclusionsThe current performance is at a level similar to human annotation and it shows that machine learning can be used to automate the task of sentence selection for GeneRIF annotation. The current experiments are limited to the human species. We would like to see how the methodology can be extended to other species, specifically the normalization of gene mentions in other species.

BMC Bioinformatics | 2013

MeSH indexing based on automatically generated summaries

Antonio Jimeno-Yepes; Laura Plaza; James G. Mork; Alan R. Aronson; Alberto Espuny Díaz

BackgroundMEDLINE citations are manually indexed at the U.S. National Library of Medicine (NLM) using as reference the Medical Subject Headings (MeSH) controlled vocabulary. For this task, the human indexers read the full text of the article. Due to the growth of MEDLINE, the NLM Indexing Initiative explores indexing methodologies that can support the task of the indexers. Medical Text Indexer (MTI) is a tool developed by the NLM Indexing Initiative to provide MeSH indexing recommendations to indexers. Currently, the input to MTI is MEDLINE citations, title and abstract only. Previous work has shown that using full text as input to MTI increases recall, but decreases precision sharply. We propose using summaries generated automatically from the full text for the input to MTI to use in the task of suggesting MeSH headings to indexers. Summaries distill the most salient information from the full text, which might increase the coverage of automatic indexing approaches based on MEDLINE. We hypothesize that if the results were good enough, manual indexers could possibly use automatic summaries instead of the full texts, along with the recommendations of MTI, to speed up the process while maintaining high quality of indexing results.ResultsWe have generated summaries of different lengths using two different summarizers, and evaluated the MTI indexing on the summaries using different algorithms: MTI, individual MTI components, and machine learning. The results are compared to those of full text articles and MEDLINE citations. Our results show that automatically generated summaries achieve similar recall but higher precision compared to full text articles. Compared to MEDLINE citations, summaries achieve higher recall but lower precision.ConclusionsOur results show that automatic summaries produce better indexing than full text articles. Summaries produce similar recall to full text but much better precision, which seems to indicate that automatic summaries can efficiently capture the most important contents within the original articles. The combination of MEDLINE citations and automatically generated summaries could improve the recommendations suggested by MTI. On the other hand, indexing performance might be dependent on the MeSH heading being indexed. Summarization techniques could thus be considered as a feature selection algorithm that might have to be tuned individually for each MeSH heading.

Journal of The Medical Library Association | 2011

A retrospective cohort study of structured abstracts in MEDLINE, 1992–2006

Anna Ripple; James G. Mork; Lou S. Knecht; Betsy L. Humphreys

Structured abstracts contain distinct labeled sections (e.g., “RESULTS”). The MEDLINE/PubMed database incorporates English-language abstracts that appear in the journals that the US National Library of Medicine (NLM) indexes. If English-language structured abstracts appear in a journal that is indexed, the labels in these abstracts usually appear in all uppercase letters, generally followed by a colon, in MEDLINE/PubMed citations [1]. Several years after formats for more informative abstracts were proposed [2–,5], NLM studied the structured abstracts that appeared in MEDLINE from 1989–1991 as an initial step in exploring their utility in enhancing bibliographic retrieval [6]. This early study showed that structured abstracts were an emerging, but rapidly growing phenomenon; that MEDLINE records with structured abstracts tended to have more access points (Medical Subject Headings [MeSH] terms and text words) than MEDLINE records as a whole; and that there was significant variation in the structured abstract formats that different journals prescribed. Implementation of structured abstracts by biomedical journals has been examined on a small scale in the clinical medicine domain [7, 8], but no large-scale examination across all of MEDLINE has occurred since the first exploratory study by NLM. Hence, the objective of this study was to conduct a retrospective cohort study to measure and characterize the growth in structured abstracts in MEDLINE since 1991, with a view, again, toward exploring their utility in enhancing information display and retrieval.

international health informatics symposium | 2012

MEDLINE MeSH indexing: lessons learned from machine learning and future directions

Antonio Jimeno Yepes; James G. Mork; BartBomiej Wilkowski; Dina Demner Fushman; Alan R. Aronson

Due to the large yearly growth of MEDLINE, MeSH indexing is becoming a more difficult task for a relatively small group of highly qualified indexing staff at the US National Library of Medicine (NLM). The Medical Text Indexer (MTI) is a support tool for assisting indexers; this tool relies on MetaMap and a k-NN approach called PubMed Related Citations (PRC). Our motivation is to improve the quality of MTI based on machine learning. Typical machine learning approaches fit this indexing task into text categorization. In this work, we have studied some Medical Subject Headings (MeSH) recommended by MTI and analyzed the issues when using standard machine learning algorithms. We show that in some cases machine learning can improve the annotations already recommended by MTI, that machine learning based on low variance methods achieves better performance and that each MeSH heading presents a different behavior. In addition, there are several factors which make this task difficult (e.g. limited access to the full-text of the citations) which provide direction for future work.

Bioinformatics | 2009

Comment on ‘MeSH-up

Aurélie Névéol; James G. Mork; Alan R. Aronson

Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

BMC Bioinformatics | 2015

Feature engineering for MEDLINE citation categorization with MeSH

Antonio Jimeno Yepes; Laura Plaza; Jorge Carrillo-de-Albornoz; James G. Mork; Alan R. Aronson

BackgroundResearch in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations.ResultsTraditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance.ConclusionsWe conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system.

Explore More