Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Abeed Sarker is active.

Publication


Featured researches published by Abeed Sarker.


Journal of Biomedical Informatics | 2015

Utilizing social media data for pharmacovigilance

Abeed Sarker; Rachel E. Ginn; Azadeh Nikfarjam; Karen O'Connor; Karen Smith; Swetha Jayaraman; Tejaswi Upadhaya; Graciela Gonzalez

OBJECTIVE Automatic monitoring of Adverse Drug Reactions (ADRs), defined as adverse patient outcomes caused by medications, is a challenging research problem that is currently receiving significant attention from the medical informatics community. In recent years, user-posted data on social media, primarily due to its sheer volume, has become a useful resource for ADR monitoring. Research using social media data has progressed using various data sources and techniques, making it difficult to compare distinct systems and their performances. In this paper, we perform a methodical review to characterize the different approaches to ADR detection/extraction from social media, and their applicability to pharmacovigilance. In addition, we present a potential systematic pathway to ADR monitoring from social media. METHODS We identified studies describing approaches for ADR detection from social media from the Medline, Embase, Scopus and Web of Science databases, and the Google Scholar search engine. Studies that met our inclusion criteria were those that attempted to extract ADR information posted by users on any publicly available social media platform. We categorized the studies according to different characteristics such as primary ADR detection approach, size of corpus, data source(s), availability, and evaluation criteria. RESULTS Twenty-two studies met our inclusion criteria, with fifteen (68%) published within the last two years. However, publicly available annotated data is still scarce, and we found only six studies that made the annotations used publicly available, making system performance comparisons difficult. In terms of algorithms, supervised classification techniques to detect posts containing ADR mentions, and lexicon-based approaches for extraction of ADR mentions from texts have been the most popular. CONCLUSION Our review suggests that interest in the utilization of the vast amounts of available social media data for ADR monitoring is increasing. In terms of sources, both health-related and general social media data have been used for ADR detection-while health-related sources tend to contain higher proportions of relevant data, the volume of data from general social media websites is significantly higher. There is still very limited amount of annotated data publicly available , and, as indicated by the promising results obtained by recent supervised learning approaches, there is a strong need to make such data available to the research community.


Journal of the American Medical Informatics Association | 2015

Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features

Azadeh Nikfarjam; Abeed Sarker; Karen O'Connor; Rachel E. Ginn; Graciela Gonzalez

Abstract Objective Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. Methods We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words’ semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. Results ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. Conclusion It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets.


Journal of Biomedical Informatics | 2015

Portable automatic text classification for adverse drug reaction detection via multi-corpus training

Abeed Sarker; Graciela Gonzalez

OBJECTIVE Automatic detection of adverse drug reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media-where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing (NLP) approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. METHODS One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. RESULTS Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. CONCLUSIONS Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future.


Journal of Biomedical Informatics | 2016

Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts

Ioannis Korkontzelos; Azadeh Nikfarjam; Matthew Shardlow; Abeed Sarker; Sophia Ananiadou; Graciela Gonzalez

Graphical abstract


pacific symposium on biocomputing | 2016

SOCIAL MEDIA MINING SHARED TASK WORKSHOP

Abeed Sarker; Azadeh Nikfarjam; Graciela Gonzalez

Social media has evolved into a crucial resource for obtaining large volumes of real-time information. The promise of social media has been realized by the public health domain, and recent research has addressed some important challenges in that domain by utilizing social media data. Tasks such as monitoring flu trends, viral disease outbreaks, medication abuse, and adverse drug reactions are some examples of studies where data from social media have been exploited. The focus of this workshop is to explore solutions to three important natural language processing challenges for domain-specific social media text: (i) text classification, (ii) information extraction, and (iii) concept normalization. To explore different approaches to solving these problems on social media data, we designed a shared task which was open to participants globally. We designed three tasks using our in-house annotated Twitter data on adverse drug reactions. Task 1 involved automatic classification of adverse drug reaction assertive user posts; Task 2 focused on extracting specific adverse drug reaction mentions from user posts; and Task 3, which was slightly ill-defined due to the complex nature of the problem, involved normalizing user mentions of adverse drug reactions to standardized concept IDs. A total of 11 teams participated, and a total of 24 (18 for Task 1, and 6 for Task 2) system runs were submitted. Following the evaluation of the systems, and an assessment of their innovation/novelty, we accepted 7 descriptive manuscripts for publication--5 for Task 1 and 2 for Task 2. We provide descriptions of the tasks, data, and participating systems in this paper.


Data in Brief | 2017

A corpus for mining drug-related knowledge from Twitter chatter: Language models and their utilities

Abeed Sarker; Graciela Gonzalez

In this data article, we present to the data science, natural language processing and public heath communities an unlabeled corpus and a set of language models. We collected the data from Twitter using drug names as keywords, including their common misspelled forms. Using this data, which is rich in drug-related chatter, we developed language models to aid the development of data mining tools and methods in this domain. We generated several models that capture (i) distributed word representations and (ii) probabilities of n-gram sequences. The data set we are releasing consists of 267,215 Twitter posts made during the four-month period—November, 2014 to February, 2015. The posts mention over 250 drug-related keywords. The language models encapsulate semantic and sequential properties of the texts.


artificial intelligence in medicine in europe | 2013

An Approach for Query-Focused Text Summarisation for Evidence Based Medicine

Abeed Sarker; Diego Mollá; Cécile Paris

We present an approach for extractive, query-focused, single-document summarisation of medical text. Our approach utilises a combination of target-sentence-specific and target-sentence-independent statistics derived from a corpus specialised for summarisation in the medical domain. We incorporate domain knowledge via the application of multiple domain-specific features, and we customise the answer extraction process for different question types. The use of carefully selected domain-specific features enables our summariser to generate content-rich extractive summaries, and an automatic evaluation of our system reveals that it outperforms other baseline and benchmark summarisation systems with a percentile rank of 96.8%.


BioNLP 2017 | 2017

Detecting Personal Medication Intake in Twitter: An Annotated Corpus and Baseline Classification System

Ari Z. Klein; Abeed Sarker; Masoud Rouhizadeh; Karen O'Connor; Graciela Gonzalez

Social media sites (e.g., Twitter) have been used for surveillance of drug safety at the population level, but studies that focus on the effects of medications on specific sets of individuals have had to rely on other sources of data. Mining social media data for this information would require the ability to distinguish indications of personal medication intake in this media. Towards that end, this paper presents an annotated corpus that can be used to train machine learning systems to determine whether a tweet that mentions a medication indicates that the individual posting has taken that medication (at a specific time). To demonstrate the utility of the corpus as a training set, we present baseline results of supervised classification.


Artificial Intelligence in Medicine | 2015

Automatic evidence quality prediction to support evidence-based decision making

Abeed Sarker; Diego Mollá; Cécile Paris

BACKGROUND Evidence-based medicine practice requires practitioners to obtain the best available medical evidence, and appraise the quality of the evidence when making clinical decisions. Primarily due to the plethora of electronically available data from the medical literature, the manual appraisal of the quality of evidence is a time-consuming process. We present a fully automatic approach for predicting the quality of medical evidence in order to aid practitioners at point-of-care. METHODS Our approach extracts relevant information from medical article abstracts and utilises data from a specialised corpus to apply supervised machine learning for the prediction of the quality grades. Following an in-depth analysis of the usefulness of features (e.g., publication types of articles), they are extracted from the text via rule-based approaches and from the meta-data associated with the articles, and then applied in the supervised classification model. We propose the use of a highly scalable and portable approach using a sequence of high precision classifiers, and introduce a simple evaluation metric called average error distance (AED) that simplifies the comparison of systems. We also perform elaborate human evaluations to compare the performance of our system against human judgments. RESULTS We test and evaluate our approaches on a publicly available, specialised, annotated corpus containing 1132 evidence-based recommendations. Our rule-based approach performs exceptionally well at the automatic extraction of publication types of articles, with F-scores of up to 0.99 for high-quality publication types. For evidence quality classification, our approach obtains an accuracy of 63.84% and an AED of 0.271. The human evaluations show that the performance of our system, in terms of AED and accuracy, is comparable to the performance of humans on the same data. CONCLUSIONS The experiments suggest that our structured text classification framework achieves evaluation results comparable to those of human performance. Our overall classification approach and evaluation technique are also highly portable and can be used for various evidence grading scales.


digital image computing: techniques and applications | 2010

Improved Reconstruction of Flutter Shutter Images for Motion Blur Reduction

Abeed Sarker; Leonard G. C. Hamey

Relative motion between a camera and its subject introduces motion blur in captured images. Reconstruction of unblurred images is ill-posed due to the loss of spatial high frequencies. The flutter shutter preserves high frequencies by rapidly opening and closing the shutter during exposure, providing greatly improved reconstruction. We address two open problems in the reconstruction of unblurred images from flutter shutter images. Firstly, we propose a noise reduction technique that reduces reconstruction noise while preserving image detail. Secondly, we propose a semi-automatic technique for estimating the Point Spread Function of the motion blur. Together these techniques provide substantial improvement in reconstruction of flutter shutter images.

Collaboration


Dive into the Abeed Sarker's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Cécile Paris

Commonwealth Scientific and Industrial Research Organisation

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ari Z. Klein

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Arjun Magge

Arizona State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Karen O'Connor

Arizona State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge