Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sarvnaz Karimi is active.

Publication


Featured researches published by Sarvnaz Karimi.


international world wide web conferences | 2013

Location extraction from disaster-related microblogs

John Lingad; Sarvnaz Karimi; Jie Yin

Location information is critical to understanding the impact of a disaster, including where the damage is, where people need assistance and where help is available. We investigate the feasibility of applying Named Entity Recognizers to extract locations from microblogs, at the level of both geo-location and point-of-interest. Our experimental results show that such tools once retrained on microblog data have great potential to detect the where information, even at the granularity of point-of-interest.


ACM Computing Surveys | 2015

Text and Data Mining Techniques in Adverse Drug Reaction Detection

Sarvnaz Karimi; Chen Wang; Alejandro Metke-Jimenez; Raj Gaire; Cécile Paris

We review data mining and related computer science techniques that have been studied in the area of drug safety to identify signals of adverse drug reactions from different data sources, such as spontaneous reporting databases, electronic health records, and medical literature. Development of such techniques has become more crucial for public heath, especially with the growth of data repositories that include either reports of adverse drug reactions, which require fast processing for discovering signals of adverse reactions, or data sources that may contain such signals but require data or text mining techniques to discover them. In order to highlight the importance of contributions made by computer scientists in this area so far, we categorize and review the existing approaches, and most importantly, we identify areas where more research should be undertaken.


Journal of Biomedical Informatics | 2015

Cadec: A corpus of adverse drug event annotations

Sarvnaz Karimi; Alejandro Metke-Jimenez; Madonna Kemp; Chen Wang

CSIRO Adverse Drug Event Corpus (Cadec) is a new rich annotated corpus of medical forum posts on patient-reported Adverse Drug Events (ADEs). The corpus is sourced from posts on social media, and contains text that is largely written in colloquial language and often deviates from formal English grammar and punctuation rules. Annotations contain mentions of concepts such as drugs, adverse effects, symptoms, and diseases linked to their corresponding concepts in controlled vocabularies, i.e., SNOMED Clinical Terms and MedDRA. The quality of the annotations is ensured by annotation guidelines, multi-stage annotations, measuring inter-annotator agreement, and final review of the annotations by a clinical terminologist. This corpus is useful for studies in the area of information extraction, or more generally text mining, from social media to detect possible adverse drug reactions from direct patient reports. The corpus is publicly available at https://data.csiro.au.(1).


conference on information and knowledge management | 2012

ESA: emergency situation awareness via microbloggers

Jie Yin; Sarvnaz Karimi; Bella Robinson; Mark A. Cameron

During a disastrous event, such as an earthquake or river flooding, information on what happened, who was affected and how, where help is needed, and how to aid people who were affected, is crucial. While communication is important in such times of crisis, damage to infrastructure such as telephone lines makes it difficult for authorities and victims to communicate. Microblogging has played a critical role as an important communication platform during crises when other media has failed. We demonstrate our ESA (Emergency Situation Awareness) system that mines microblogs in real-time to extract and visualise useful information about incidents and their impact on the community in order to equip the right authorities and the general public with situational awareness.


australasian document computing symposium | 2013

Classifying microblogs for disasters

Sarvnaz Karimi; Jie Yin; Cécile Paris

Monitoring social media in critical disaster situations can potentially assist emergency and media personnel to deal with events as they unfold, and focus their resources where they are most needed. We address the issue of filtering massive amounts of Twitter data to identify high-value messages related to disasters, and to further classify disaster-related messages into those pertaining to particular disaster types, such as earthquake, flooding, fire, or storm. Unlike post-hoc analysis that most previous studies have done, we focus on building a classification model on past incidents to detect tweets about current incidents. Our experimental results demonstrate the feasibility of using classification methods to identify disaster-related tweets. We analyse the effect of different features in classifying tweets and show that using generic features rather than incident-specific ones leads to better generalisation on the effectiveness of classifying unseen incidents.


BMC Medical Informatics and Decision Making | 2015

Automatic classification of diseases from free-text death certificates for real-time surveillance

Bevan Koopman; Sarvnaz Karimi; Anthony N. Nguyen; Rhydwyn McGuire; David Muscatello; Madonna Kemp; Donna Truran; Ming Zhang; Sarah Thackway

BackgroundDeath certificates provide an invaluable source for mortality statistics which can be used for surveillance and early warnings of increases in disease activity and to support the development and monitoring of prevention or response strategies. However, their value can be realised only if accurate, quantitative data can be extracted from death certificates, an aim hampered by both the volume and variable nature of certificates written in natural language. This study aims to develop a set of machine learning and rule-based methods to automatically classify death certificates according to four high impact diseases of interest: diabetes, influenza, pneumonia and HIV.MethodsTwo classification methods are presented: i) a machine learning approach, where detailed features (terms, term n-grams and SNOMED CT concepts) are extracted from death certificates and used to train a set of supervised machine learning models (Support Vector Machines); and ii) a set of keyword-matching rules. These methods were used to identify the presence of diabetes, influenza, pneumonia and HIV in a death certificate. An empirical evaluation was conducted using 340,142 death certificates, divided between training and test sets, covering deaths from 2000–2007 in New South Wales, Australia. Precision and recall (positive predictive value and sensitivity) were used as evaluation measures, with F-measure providing a single, overall measure of effectiveness. A detailed error analysis was performed on classification errors.ResultsClassification of diabetes, influenza, pneumonia and HIV was highly accurate (F-measure 0.96). More fine-grained ICD-10 classification effectiveness was more variable but still high (F-measure 0.80). The error analysis revealed that word variations as well as certain word combinations adversely affected classification. In addition, anomalies in the ground truth likely led to an underestimation of the effectiveness.ConclusionsThe high accuracy and low cost of the classification methods allow for an effective means for automatic and real-time surveillance of diabetes, influenza, pneumonia and HIV deaths. In addition, the methods are generally applicable to other diseases of interest and to other sources of medical free-text besides death certificates.


international acm sigir conference on research and development in information retrieval | 2014

Evaluation of text-processing algorithms for adverse drug event extraction from social media

Alejandro Metke-Jimenez; Sarvnaz Karimi; Cécile Paris

The discovery of suspected adverse drug reactions is no longer restricted to mining reports that pharmaceutical companies and health professionals send to regulators for possible safety signals. Patient forums and other social media are being studied for additional sources of information to assist in expediting adverse reaction discovery. Extracting information on drugs, adverse drug reactions, diseases and symptoms, or patient demographics from such media is an essential step of this process, but it is not straightforward. While most studies in this area use a lexicon-based information extraction methodology, they do not explicitly evaluate the impact of text-processing steps on their final results. We experimentally quantify the value of the most popular techniques to establish whether or not they benefit the information extraction process.


australasian document computing symposium | 2014

Pinpointing Locational Focus in Microblogs

Jie Yin; Sarvnaz Karimi; John Lingad

Extracting the geographical location that a tweet is about is crucial for many important applications ranging from disaster management to recommendation systems. We address the problem of finding the locational focus of tweets that is geographically identifiable on a map. Because of the short, noisy nature of tweets and inherent ambiguity of locations, tweet text alone cannot provide sufficient information for disambiguating the location mentions and inferring the actual location focus being referred to in a tweet. Therefore, we present a novel algorithm that identifies all location mentions from three information sources---tweet text, hashtags, and user profile---and then uses a gazetteer database to infer the most probable locational focus of a tweet. Our novel algorithm has the ability to infer a locational focus that may not be explicitly mentioned in the tweet and determine its most appropriate granularity, e.g., city or country.


exploiting semantic annotations in information retrieval | 2015

CADEminer: A System for Mining Consumer Reports on Adverse Drug Side Effects

Sarvnaz Karimi; Alejandro Metke-Jimenez; Anthony Nguyen

We introduce CADEminer, a system that mines consumer reviews on medications in order to facilitate discovery of drug side effects that may not have been identified in clinical trials. CADEminer utilises search and natural language processing techniques to (a) extract mentions of side effects, and other relevant concepts such as drug names and diseases in reviews; (b) normalise the extracted mentions to their unified representation in ontologies such as SNOMED CT and MedDRA; (c) identify relationships between extracted concepts, such as a drug caused a side effect; (d) search in authoritative lists of known drug side effects to identify whether or not the extracted side effects are new and therefore require further investigation; and finally (e) provide statistics and visualisation of the data.


Computational Linguistics | 2015

Evaluation methods for statistically dependent text

Sarvnaz Karimi; Jie Yin; Jiri Baum

In recent years, many studies have been published on data collected from social media, especially microblogs such as Twitter. However, rather few of these studies have considered evaluation methodologies that take into account the statistically dependent nature of such data, which breaks the theoretical conditions for using cross-validation. Despite concerns raised in the past about using cross-validation for data of similar characteristics, such as time series, some of these studies evaluate their work using standard k-fold cross-validation. Through experiments on Twitter data collected during a two-year period that includes disastrous events, we show that by ignoring the statistical dependence of the text messages published in social media, standard cross-validation can result in misleading conclusions in a machine learning task. We explore alternative evaluation methods that explicitly deal with statistical dependence in text. Our work also raises concerns for any other data for which similar conditions might hold.

Collaboration


Dive into the Sarvnaz Karimi's collaboration.

Top Co-Authors

Avatar

Jie Yin

Commonwealth Scientific and Industrial Research Organisation

View shared research outputs
Top Co-Authors

Avatar

Alejandro Metke-Jimenez

Commonwealth Scientific and Industrial Research Organisation

View shared research outputs
Top Co-Authors

Avatar

Chen Wang

Commonwealth Scientific and Industrial Research Organisation

View shared research outputs
Top Co-Authors

Avatar

Cécile Paris

Commonwealth Scientific and Industrial Research Organisation

View shared research outputs
Top Co-Authors

Avatar

Anthony Nguyen

Commonwealth Scientific and Industrial Research Organisation

View shared research outputs
Top Co-Authors

Avatar

Bella Robinson

Commonwealth Scientific and Industrial Research Organisation

View shared research outputs
Top Co-Authors

Avatar

Madonna Kemp

Royal Brisbane and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar

Mark A. Cameron

Commonwealth Scientific and Industrial Research Organisation

View shared research outputs
Top Co-Authors

Avatar

Anthony N. Nguyen

Royal Brisbane and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar

David Muscatello

University of New South Wales

View shared research outputs
Researchain Logo
Decentralizing Knowledge