Andrew MacKinlay | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrew MacKinlay is active.

Explore More

Publication

Featured researches published by Andrew MacKinlay.

north american chapter of the association for computational linguistics | 2009

Biomedical Event Annotation with CRFs and Precision Grammars

Andrew MacKinlay; David Martinez; Timothy Baldwin

This work describes a system for the tasks of identifying events in biomedical text and marking those that are speculative or negated. The architecture of the system relies on both Machine Learning (ML) approaches and hand-coded precision grammars. We submitted the output of our approach to the event extraction shared task at BioNLP 2009, where our methods suffered from low recall, although we were one of the few teams to provide answers for task 3.

BMC Medical Informatics and Decision Making | 2012

Detecting modification of biomedical events using a deep parsing approach

Andrew MacKinlay; David Martinez; Timothy Baldwin

BackgroundThis work describes a system for identifying event mentions in bio-molecular research abstracts that are either speculative (e.g. analysisof IkappaBalpha phosphorylation, where it is not specified whether phosphorylation did or did not occur) or negated (e.g. inhibitionof IkappaBalpha phosphorylation, where phosphorylation did not occur). The data comes from a standard dataset created for the BioNLP 2009 Shared Task. The system uses a machine-learning approach, where the features used for classification are a combination of shallow features derived from the words of the sentences and more complex features based on the semantic outputs produced by a deep parser.MethodTo detect event modification, we use a Maximum Entropy learner with features extracted from the data relative to the trigger words of the events. The shallow features are bag-of-words features based on a small sliding context window of 3-4 tokens on either side of the trigger word. The deep parser features are derived from parses produced by the English Resource Grammar and the RASP parser. The outputs of these parsers are converted into the Minimal Recursion Semantics formalism, and from this, we extract features motivated by linguistics and the data itself. All of these features are combined to create training or test data for the machine learning algorithm.ResultsOver the test data, our methods produce approximately a 4% absolute increase in F-score for detection of event modification compared to a baseline based only on the shallow bag-of-words features.ConclusionsOur results indicate that grammar-based techniques can enhance the accuracy of methods for detecting event modification.

pacific symposium on biocomputing | 2012

Detection of Protein Catalytic Sites in the Biomedical Literature

Karin Verspoor; Andrew MacKinlay; Judith D. Cohn; Michael E. Wall

This paper explores the application of text mining to the problem of detecting protein functional sites in the biomedical literature, and specifically considers the task of identifying catalytic sites in that literature. We provide strong evidence for the need for text mining techniques that address residue-level protein function annotation through an analysis of two corpora in terms of their coverage of curated data sources. We also explore the viability of building a text-based classifier for identifying protein functional sites, identifying the low coverage of curated data sources and the potential ambiguity of information about protein functional sites as challenges that must be addressed. Nevertheless we produce a simple classifier that achieves a reasonable ∼69% F-score on our full text silver corpus on the first attempt to address this classification task. The work has application in computational prediction of the functional significance of protein sites as well as in curation workflows for databases that capture this information.

Proceedings of BioNLP 15 | 2015

Investigating Public Health Surveillance using Twitter

Antonio Jimeno Yepes; Andrew MacKinlay; Bo Han

Microblog services such as Twitter are an attractive source of data for public health surveillance, as they avoid the legal and technical obstacles to accessing the more obvious and targeted sources of health information. Only a tiny fraction of tweets may contain useful public health information but in Twitter this is oset by the sheer volume of tweets posted. We present a system which can identify medical named entities in a real-time stream of Twitter posts and determine their geographic locations, as well as preliminary experiments in using this information for health surveillance purposes.

Artificial Intelligence in Medicine | 2014

Cross-hospital portability of information extraction of cancer staging information

David Martinez; Graham Pitson; Andrew MacKinlay; Lawrence Cavedon

OBJECTIVE We address the task of extracting information from free-text pathology reports, focusing on staging information encoded by the TNM (tumour-node-metastases) and ACPS (Australian clinico-pathological stage) systems. Staging information is critical for diagnosing the extent of cancer in a patient and for planning individualised treatment. Extracting such information into more structured form saves time, improves reporting, and underpins the potential for automated decision support. METHODS AND MATERIAL We investigate the portability of a text mining model constructed from records from one health centre, by applying it directly to the extraction task over a set of records from a different health centre, with different reporting narrative characteristics. Other than a simple normalisation step on features associated with target labels, we apply the models from one system directly to the other. RESULTS The best F-scores for in-hospital experiments are 81%, 85%, and 94% (for staging T, N, and M respectively), while best cross-hospital F-scores reach 84%, 81%, and 91% for the same respective categories. CONCLUSIONS Our performance results compare favourably to the best levels reported in the literature, and--most relevant to our aim here--the cross-corpus results demonstrate the portability of the models we developed.

BMC Bioinformatics | 2015

Optimizing graph-based patterns to extract biomedical events from the literature.

Haibin Liu; Karin Verspoor; Donald C. Comeau; Andrew MacKinlay; W. John Wilbur

In BioNLP-ST 2013We participated in the BioNLP 2013 shared tasks on event extraction. Our extraction method is based on the search for an approximate subgraph isomorphism between key context dependencies of events and graphs of input sentences. Our system was able to address both the GENIA (GE) task focusing on 13 molecular biology related event types and the Cancer Genetics (CG) task targeting a challenging group of 40 cancer biology related event types with varying arguments concerning 18 kinds of biological entities. In addition to adapting our system to the two tasks, we also attempted to integrate semantics into the graph matching scheme using a distributional similarity model for more events, and evaluated the event extraction impact of using paths of all possible lengths as key context dependencies beyond using only the shortest paths in our system. We achieved a 46.38% F-score in the CG task (ranking 3rd) and a 48.93% F-score in the GE task (ranking 4th).After BioNLP-ST 2013We explored three ways to further extend our event extraction system in our previously published work: (1) We allow non-essential nodes to be skipped, and incorporated a node skipping penalty into the subgraph distance function of our approximate subgraph matching algorithm. (2) Instead of assigning a unified subgraph distance threshold to all patterns of an event type, we learned a customized threshold for each pattern. (3) We implemented the well-known Empirical Risk Minimization (ERM) principle to optimize the event pattern set by balancing prediction errors on training data against regularization. When evaluated on the official GE task test data, these extensions help to improve the extraction precision from 62% to 65%. However, the overall F-score stays equivalent to the previous performance due to a 1% drop in recall.

Proceedings of the ACM fifth international workshop on Data and text mining in biomedical informatics | 2011

A parser-based approach to detecting modification of biomedical events

Andrew MacKinlay; David Martinez; Timothy Baldwin

This work describes a system for identifying event mentions in bio-molecular text that are either speculative (e.g. analysis of IkappaBalpha phosphorylation, where it is not specified whether phosphorylation did or did not occur) or negated (e.g. inhibition of IkappaBalpha phosphorylation, where phosphorylation did not occur). Our system combines a simple bag-of-words approach with two grammar-based approaches, namely the English Resource Grammar and the RASP parser. We interpret the output of the respective parsers via MRS semantics, and feed them into a machine learner. Our results indicate that grammar-based techniques can enhance the accuracy of methods for detecting event modification.

Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics | 2012

Extracting structured information from free-text medication prescriptions using dependencies

Andrew MacKinlay; Karin Verspoor

We explore an information extraction task where the goal is to determine the correct values for fields which are relevant to prescription drug administration such as dosage amount, frequency and route. The data set is a collection of prescriptions from a long-term health-care facility, a small subset of which we have manually annotated with values for these fields. We first examine a rule-based approach to the task, which uses a dependency parse of the prescription, achieving accuracies of 60-95% over various different fields, and 67.5% when all fields of the prescription are considered together. The outputs of such a system have potential applications in detecting irregularities in dosage delivery.

bioRxiv | 2018

A hybrid approach for automated mutation annotation of the extended human mutation landscape in scientific literature

Antonio Jimeno Yepes; Andrew MacKinlay; Natalie Gunn; Christine Schieber; Noel G. Faux; Matthew T. Downton; Benjamin Goudey; Richard L. Martin

As the cost of DNA sequencing continues to fall, an increasing amount of information on human genetic variation is being produced that could help progress precision medicine. However, information about such mutations is typically first made available in the scientific literature, and is then later manually curated into more standardized genomic databases. This curation process is expensive, time-consuming and many variants do not end up being fully curated, if at all. Detecting mutations in the literature is the first key step towards automating this process. However, most of the current methods have focused on identifying mutations that follow existing nomenclatures. In this work, we show that there is a large number of mutations that are missed by using this standard approach. Furthermore, we implement the first mutation annotator to cover an extended mutation landscape, and we show that its F1 performance is the same performance as human annotation (F1 78.29 for manual annotation vs F1 79.56 for automatic annotation).

Clinical Colorectal Cancer | 2018

Stage-based Variation in the Effect of Primary Tumor Side on All Stages of Colorectal Cancer Recurrence and Survival

Margaret Lee; Andrew MacKinlay; Christine Semira; Christine Schieber; Antonio Jimeno Yepes; Belinda Lee; Rachel Wong; Chathurika K.H. Hettiarachchige; Natalie Gunn; Jeanne Tie; Hui-Li Wong; Iain Skinner; Ian Jones; James Keck; Suzanne Kosmider; Ben Tran; Kathryn Maree Field; Peter Gibbs

Micro‐Abstract: Although the predictive and prognostic effect of primary tumor side in metastatic colorectal cancer is now widely accepted, it is poorly defined for early‐stage disease. In the present analysis of > 6500 patients, we found stage‐by‐stage differences in survival outcomes according to the primary tumor location, which was partially attributable to differences in survival after recurrence. However, the primary tumor location did not influence the benefit of adjuvant chemotherapy. Background: Multiple studies have defined the prognostic and potential predictive significance of the primary tumor side in metastatic colorectal cancer (CRC). However, the currently available data for early‐stage disease are limited and inconsistent. Materials and Methods: We explored the clinicopathologic, treatment, and outcome data from a multisite Australian CRC registry from 2003 to 2016. Tumors at and distal to the splenic flexure were considered a left primary (LP). Results: For the 6547 patients identified, the median age at diagnosis was 69 years, 55% were men, and most (63%) had a LP. Comparing the outcomes for right primary (RP) versus LP, time‐to‐recurrence was similar for stage I and III disease, but longer for those with a stage II RP (hazard ratio [HR], 0.68; 95% confidence interval [CI], 0.52–0.90; P < .01). Adjuvant chemotherapy provided a consistent benefit in stage III disease, regardless of the tumor side. Overall survival (OS) was similar for those with stage I and II disease between LP and RP patients; however, those with stage III RP disease had poorer OS (HR, 1.30; 95% CI, 1.04–1.62; P < .05) and cancer‐specific survival (HR, 1.55; 95% CI, 1.19–2.03; P < .01). Patients with stage IV RP, whether de novo metastatic (HR, 1.15; 95% CI, 0.95–1.39) or relapsed post–early‐stage disease (HR, 1.35; 95% CI, 1.11–1.65; P < .01), had poorer OS. Conclusion: In early‐stage CRC, the association of tumor side and effect on the time‐to‐recurrence and OS varies by stage. In stage III patients with an RP, poorer OS and cancer‐specific survival outcomes are, in part, driven by inferior survival after recurrence, and tumor side did not influence adjuvant chemotherapy benefit.

Explore More