Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Aaron M. Cohen is active.

Publication


Featured researches published by Aaron M. Cohen.


International Journal of Medical Informatics | 2004

A categorization and analysis of the criticisms of Evidence-Based Medicine

Aaron M. Cohen; P. Zoë Stavri; William R. Hersh

The major criticisms and limitations of Evidence-Based Medicine (EBM) appearing in the literature over the past decade can be summarized and categorized into five recurring themes. The themes include: reliance on empiricism, narrow definition of evidence, lack of evidence of efficacy, limited usefulness for individual patients, and threats to the autonomy of the doctor/patient relationship. Analysis of EBM according to these themes leads to the conclusion that EBM can be a useful tool, but has severe drawbacks when used in isolation in the practice of individual patient care. Modern medicine must strive to balance an extremely complex set of priorities. To be an effective aid in achieving this balance, the theory and practice of EBM must expand to include new methods of study design and integration, and must adapt to the needs of both patients and the health care system in order to provide patients with the best care at the lowest cost.


Journal of the American Medical Informatics Association | 2006

Reducing Workload in Systematic Review Preparation Using Automated Citation Classification

Aaron M. Cohen; William R. Hersh; K. Peterson; Po-Yin Yen

OBJECTIVE To determine whether automated classification of document citations can be useful in reducing the time spent by experts reviewing journal articles for inclusion in updating systematic reviews of drug class efficacy for treatment of disease. DESIGN A test collection was built using the annotated reference files from 15 systematic drug class reviews. A voting perceptron-based automated citation classification system was constructed to classify each article as containing high-quality, drug class-specific evidence or not. Cross-validation experiments were performed to evaluate performance. MEASUREMENTS Precision, recall, and F-measure were evaluated at a range of sample weightings. Work saved over sampling at 95% recall was used as the measure of value to the review process. RESULTS A reduction in the number of articles needing manual review was found for 11 of the 15 drug review topics studied. For three of the topics, the reduction was 50% or greater. CONCLUSION Automated document citation classification could be a useful tool in maintaining systematic reviews of the efficacy of drug therapy. Further work is needed to refine the classification system and determine the best manner to integrate the system into the production of systematic reviews.


Genome Biology | 2008

Text mining for biology - the way forward: opinions from leading scientists

Russ B. Altman; Casey M. Bergman; Judith A. Blake; Christian Blaschke; Aaron M. Cohen; Frank Gannon; Les Grivell; Udo Hahn; William R. Hersh; Lynette Hirschman; Lars Juhl Jensen; Martin Krallinger; Barend Mons; Seán I. O'Donoghue; Manuel C. Peitsch; Dietrich Rebholz-Schuhmann; Hagit Shatkay; Alfonso Valencia

This article collects opinions from leading scientists about how text mining can provide better access to the biological literature, how the scientific community can help with this process, what the next steps are, and what role future BioCreative evaluations can play. The responses identify several broad themes, including the possibility of fusing literature and biological databases through text mining; the need for user interfaces tailored to different classes of users and supporting community-based annotation; the importance of scaling text mining technology and inserting it into larger workflows; and suggestions for additional challenge evaluations, new applications, and additional resources needed to make progress.


PLOS ONE | 2015

Plasma Exosomal miRNAs in Persons with and without Alzheimer Disease: Altered Expression and Prospects for Biomarkers

Giovanni Lugli; Aaron M. Cohen; David A. Bennett; Raj C. Shah; Christopher J. Fields; Alvaro G. Hernandez; Neil R. Smalheiser

To assess the value of exosomal miRNAs as biomarkers for Alzheimer disease (AD), the expression of microRNAs was measured in a plasma fraction enriched in exosomes by differential centrifugation, using Illumina deep sequencing. Samples from 35 persons with a clinical diagnosis of AD dementia were compared to 35 age and sex matched controls. Although these samples contained less than 0.1 microgram of total RNA, deep sequencing gave reliable and informative results. Twenty miRNAs showed significant differences in the AD group in initial screening (miR-23b-3p, miR-24-3p, miR-29b-3p, miR-125b-5p, miR-138-5p, miR-139-5p, miR-141-3p, miR-150-5p, miR-152-3p, miR-185-5p, miR-338-3p, miR-342-3p, miR-342-5p, miR-548at-5p, miR-659-5p, miR-3065-5p, miR-3613-3p, miR-3916, miR-4772-3p, miR-5001-3p), many of which satisfied additional biological and statistical criteria, and among which a panel of seven miRNAs were highly informative in a machine learning model for predicting AD status of individual samples with 83–89% accuracy. This performance is not due to over-fitting, because a) we used separate samples for training and testing, and b) similar performance was achieved when tested on technical replicate data. Perhaps the most interesting single miRNA was miR-342-3p, which was a) expressed in the AD group at about 60% of control levels, b) highly correlated with several of the other miRNAs that were significantly down-regulated in AD, and c) was also reported to be down-regulated in AD in two previous studies. The findings warrant replication and follow-up with a larger cohort of patients and controls who have been carefully characterized in terms of cognitive and imaging data, other biomarkers (e.g., CSF amyloid and tau levels) and risk factors (e.g., apoE4 status), and who are sampled repeatedly over time. Integrating miRNA expression data with other data is likely to provide informative and robust biomarkers in Alzheimer disease.


BMC Bioinformatics | 2005

Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts

Aaron M. Cohen; William R. Hersh; Christopher Dubay; Kent A. Spackman

BackgroundText-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction.ResultsPerformance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs.ConclusionThe method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge.


intelligent systems in molecular biology | 2005

Unsupervised Gene/Protein Named Entity Normalization Using Automatically Extracted Dictionaries

Aaron M. Cohen

Gene and protein named-entity recognition (NER) and normalization is often treated as a two-step process. While the first step, NER, has received considerable attention over the last few years, normalization has received much less attention. We have built a dictionary based gene and protein NER and normalization system that requires no supervised training and no human intervention to build the dictionaries from online genomics resources. We have tested our system on the Genia corpus and the BioCreative Task 1B mouse and yeast corpora and achieved a level of performance comparable to state-of-the-art systems that require supervised learning and manual dictionary creation. Our technique should also work for organisms following similar naming conventions as mouse, such as human. Further evaluation and improvement of gene/protein NER and normalization systems is somewhat hampered by the lack of larger test collections and collections for additional organisms, such as human.


Journal of Biomedical Discovery and Collaboration | 2006

Enhancing access to the Bibliome: the TREC 2004 Genomics Track

William R. Hersh; Ravi Teja Bhupatiraju; Laura Ross; Phoebe M. Roberts; Aaron M. Cohen; Dale F. Kraemer

BackgroundThe goal of the TREC Genomics Track is to improve information retrieval in the area of genomics by creating test collections that will allow researchers to improve and better understand failures of their systems. The 2004 track included an ad hoc retrieval task, simulating use of a search engine to obtain documents about biomedical topics. This paper describes the Genomics Track of the Text Retrieval Conference (TREC) 2004, a forum for evaluation of IR research systems, where retrieval in the genomics domain has recently begun to be assessed.ResultsA total of 27 research groups submitted 47 different runs. The most effective runs, as measured by the primary evaluation measure of mean average precision (MAP), used a combination of domain-specific and general techniques. The best MAP obtained by any run was 0.4075. Techniques that expanded queries with gene name lists as well as words from related articles had the best efficacy. However, many runs performed more poorly than a simple baseline run, indicating that careful selection of system features is essential.ConclusionVarious approaches to ad hoc retrieval provide a diversity of efficacy. The TREC Genomics Track and its test collection resources provide tools that allow improvement in information retrieval systems.


Journal of the American Medical Informatics Association | 2009

Cross-Topic Learning for Work Prioritization in Systematic Review Creation and Update

Aaron M. Cohen; Kyle H. Ambert; Marian McDonagh

OBJECTIVE Machine learning systems can be an aid to experts performing systematic reviews (SRs) by automatically ranking journal articles for work-prioritization. This work investigates whether a topic-specific automated document ranking system for SRs can be improved using a hybrid approach, combining topic-specific training data with data from other SR topics. DESIGN A test collection was built using annotated reference files from 24 systematic drug class reviews. A support vector machine learning algorithm was evaluated with cross-validation, using seven different fractions of topic-specific training data in combination with samples from the other 23 topics. This approach was compared to both a baseline system, which used only topic-specific training data, and to a system using only the nontopic data sampled from the remaining topics. MEASUREMENTS Mean area under the receiver-operating curve (AUC) was used as the measure of comparison. RESULTS On average, the hybrid system improved mean AUC over the baseline system by 20%, when topic-specific training data were scarce. The system performed significantly better than the baseline system at all levels of topic-specific training data. In addition, the system performed better than the nontopic system at all but the two smallest fractions of topic specific training data, and no worse than the nontopic system with these smallest amounts of topic specific training data. CONCLUSIONS Automated literature prioritization could be helpful in assisting experts to organize their time when performing systematic reviews. Future work will focus on extending the algorithm to use additional sources of topic-specific data, and on embedding the algorithm in an interactive system available to systematic reviewers during the literature review process.


Journal of Biomedical Discovery and Collaboration | 2006

The TREC 2004 genomics track categorization task: classifying full text biomedical documents

Aaron M. Cohen; William R. Hersh

BackgroundThe TREC 2004 Genomics Track focused on applying information retrieval and text mining techniques to improve the use of genomic information in biomedicine. The Genomics Track consisted of two main tasks, ad hoc retrieval and document categorization. In this paper, we describe the categorization task, which focused on the classification of full-text documents, simulating the task of curators of the Mouse Genome Informatics (MGI) system and consisting of three subtasks. One subtask of the categorization task required the triage of articles likely to have experimental evidence warranting the assignment of GO terms, while the other two subtasks were concerned with the assignment of the three top-level GO categories to each paper containing evidence for these categories.ResultsThe track had 33 participating groups. The mean and maximum utility measure for the triage subtask was 0.3303, with a top score of 0.6512. No system was able to substantially improve results over simply using the MeSH term Mice. Analysis of significant feature overlap between the training and test sets was found to be less than expected. Sample coverage of GO terms assigned to papers in the collection was very sparse. Determining papers containing GO term evidence will likely need to be treated as separate tasks for each concept represented in GO, and therefore require much denser sampling than was available in the data sets.The annotation subtask had a mean F-measure of 0.3824, with a top score of 0.5611. The mean F-measure for the annotation plus evidence codes subtask was 0.3676, with a top score of 0.4224. Gene name recognition was found to be of benefit for this task.ConclusionAutomated classification of documents for GO annotation is a challenging task, as was the automated extraction of GO code hierarchies and evidence codes. However, automating these tasks would provide substantial benefit to biomedical curation, and therefore work in this area must continue. Additional experience will allow comparison and further analysis about which algorithmic features are most useful in biomedical document classification, and better understanding of the task characteristics that make automated classification feasible and useful for biomedical document curation. The TREC Genomics Track will be continuing in 2005 focusing on a wider range of triage tasks and improving results from 2004.


Journal of the American Medical Informatics Association | 2008

Five-way Smoking Status Classification Using Text Hot-Spot Identification and Error-correcting Output Codes

Aaron M. Cohen

We participated in the i2b2 smoking status classification challenge task. The purpose of this task was to evaluate the ability of systems to automatically identify patient smoking status from discharge summaries. Our submission included several techniques that we compared and studied, including hot-spot identification, zero-vector filtering, inverse class frequency weighting, error-correcting output codes, and post-processing rules. We evaluated our approaches using the same methods as the i2b2 task organizers, using micro- and macro-averaged F1 as the primary performance metric. Our best performing system achieved a micro-F1 of 0.9000 on the test collection, equivalent to the best performing system submitted to the i2b2 challenge. Hot-spot identification, zero-vector filtering, classifier weighting, and error correcting output coding contributed additively to increased performance, with hot-spot identification having by far the largest positive effect. High performance on automatic identification of patient smoking status from discharge summaries is achievable with the efficient and straightforward machine learning techniques studied here.

Collaboration


Dive into the Aaron M. Cohen's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Neil R. Smalheiser

University of Illinois at Chicago

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Clement T. Yu

University of Illinois at Chicago

View shared research outputs
Top Co-Authors

Avatar

John M. Davis

University of Illinois at Chicago

View shared research outputs
Top Co-Authors

Avatar

Clive E Adams

University of Nottingham

View shared research outputs
Top Co-Authors

Avatar

Philip S. Yu

University of Illinois at Chicago

View shared research outputs
Researchain Logo
Decentralizing Knowledge