Jon Patrick | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jon Patrick is active.

Explore More

Publication

Featured researches published by Jon Patrick.

Journal of the American Medical Informatics Association | 2010

High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge.

Jon Patrick; Min Li

OBJECTIVE Medication information comprises a most valuable source of data in clinical records. This paper describes use of a cascade of machine learners that automatically extract medication information from clinical records. DESIGN Authors developed a novel supervised learning model that incorporates two machine learning algorithms and several rule-based engines. MEASUREMENTS Evaluation of each step included precision, recall and F-measure metrics. The final outputs of the system were scored using the i2b2 workshop evaluation metrics, including strict and relaxed matching with a gold standard. RESULTS Evaluation results showed greater than 90% accuracy on five out of seven entities in the name entity recognition task, and an F-measure greater than 95% on the relationship classification task. The strict micro averaged F-measure for the system output achieved best submitted performance of the competition, at 85.65%. LIMITATIONS Clinical staff will only use practical processing systems if they have confidence in their reliability. Authors estimate that an acceptable accuracy for a such a working system should be approximately 95%. This leaves a significant performance gap of 5 to 10% from the current processing capabilities. CONCLUSION A multistage method with mixed computational strategies using a combination of rule-based classifiers and statistical classifiers seems to provide a near-optimal strategy for automated extraction of medication information from clinical records.

BMC Medical Informatics and Decision Making | 2008

A computational linguistics motivated mapping of ICPC-2 PLUS to SNOMED CT.

Yefeng Wang; Jon Patrick; Graeme Miller; Julie O'Hallaran

BackgroundA great challenge in sharing data across information systems in general practice is the lack of interoperability between different terminologies or coding schema used in the information systems. Mapping of medical vocabularies to a standardised terminology is needed to solve data interoperability problems.MethodsWe present a system to automatically map an interface terminology ICPC-2 PLUS to SNOMED CT. Three steps of mapping are proposed in this system. The UMLS metathesaurus mapping utilises explicit relationships between ICPC-2 PLUS and SNOMED CT terms in the UMLS library to perform the first stage of the mapping. Computational linguistic mapping uses natural language processing techniques and lexical similarities for the second stage of mapping between terminologies. Finally, the post-coordination mapping allows one ICPC-2 PLUS term to be mapped into an aggregation of two or more SNOMED CT terms.ResultsA total 5,971 of all 7,410 ICPC-2 terms (80.58%) were mapped to SNOMED CT using the three stages but with different levels of accuracy. UMLS mapping achieved the mapping of 53.0% ICPC2 PLUS terms to SNOMED CT with the precision rate of 96.46% and overall recall rate of 44.89%. Lexical mapping increased the result to 60.31% and post-coordination mapping gave an increase of 20.27% in mapped terms. A manual review of a part of the mapping shows that the precision of lexical mappings is around 90%. The accuracy of post-coordination has not been evaluated yet. Unmapped terms and mismatched terms are due to the differences in the structures between ICPC-2 PLUS and SNOMED CT. Terms contained in ICPC-2 PLUS but not in SNOMED CT caused a large proportion of the failures in the mappings.ConclusionMapping terminologies to a standard vocabulary is a way to facilitate consistent medical data exchange and achieve system interoperability and data standardisation. Broad scale mapping cannot be achieved by any single method and methods based on computational linguistics can be very useful for the task. Automating as much as is possible of this process turns the searching and mapping task into a validation task, which can effectively reduce the cost and increase the efficiency and accuracy of this task over manual methods.

Journal of the American Medical Informatics Association | 2011

A knowledge discovery and reuse pipeline for information extraction in clinical notes

Jon Patrick; Dung H M Nguyen; Yefeng Wang; Min Li

OBJECTIVE Information extraction and classification of clinical data are current challenges in natural language processing. This paper presents a cascaded method to deal with three different extractions and classifications in clinical data: concept annotation, assertion classification and relation classification. MATERIALS AND METHODS A pipeline system was developed for clinical natural language processing that includes a proofreading process, with gold-standard reflexive validation and correction. The information extraction system is a combination of a machine learning approach and a rule-based approach. The outputs of this system are used for evaluation in all three tiers of the fourth i2b2/VA shared-task and workshop challenge. RESULTS Overall concept classification attained an F-score of 83.3% against a baseline of 77.0%, the optimal F-score for assertions about the concepts was 92.4% and relation classifier attained 72.6% for relationships between clinical concepts against a baseline of 71.0%. Micro-average results for the challenge test set were 81.79%, 91.90% and 70.18%, respectively. DISCUSSION The challenge in the multi-task test requires a distribution of time and work load for each individual task so that the overall performance evaluation on all three tasks would be more informative rather than treating each task assessment as independent. The simplicity of the model developed in this work should be contrasted with the very large feature space of other participants in the challenge who only achieved slightly better performance. There is a need to charge a penalty against the complexity of a model as defined in message minimalisation theory when comparing results. CONCLUSION A complete pipeline system for constructing language processing models that can be used to process multiple practical detection tasks of language structures of clinical records is presented.

north american chapter of the association for computational linguistics | 2003

Meta-learning orthographic and contextual models for language independent named entity recognition

Robert Munro; Daren Ler; Jon Patrick

This paper presents a named entity classification system that utilises both orthographic and contextual information. The random subspace method was employed to generate and refine attribute models. Supervised and unsupervised learning techniques used in the recombination of models to produce the final results.

Journal of Biomedical Informatics | 2012

An ontology for clinical questions about the contents of patient notes

Jon Patrick; Min Li

OBJECTIVE Many studies have been completed on question classification in the open domain, however only limited work focuses on the medical domain. As well, to the best of our knowledge, most of these medical question classifications were designed for literature based question and answering systems. This paper focuses on a new direction, which is to design a novel question processing and classification model for answering clinical questions applied to electronic patient notes. METHODS There are four main steps in the work. Firstly, a relatively large set of clinical questions was collected from staff in an Intensive Care Unit. Then, a clinical question taxonomy was designed for question and answering purposes. Subsequently an annotation guideline was created and used to annotate the question set. Finally, a multilayer classification model was built to classify the clinical questions. RESULTS Through the initial classification experiments, we realized that the general features cannot contribute to high performance of a minimum classifier (a small data set with multiple classes). Thus, an automatic knowledge discovery and knowledge reuse process was designed to boost the performance by extracting and expanding the specific features of the questions. In the evaluation, the results show around 90% accuracy can be achieved in the answerable subclass classification and generic question templates classification. On the other hand, the machine learning method does not perform well at identifying the category of unanswerable questions, due to the asymmetric distribution. CONCLUSIONS In this paper, a comprehensive study on clinical questions has been completed. A major outcome of this work is the multilayer classification model. It serves as a major component of a patient records based clinical question and answering system as our studies continue. As well, the question collections can be reused by the research community to improve the efficiency of their own question and answering systems.

international conference on computational linguistics | 2002

SLINERC: the Sydney Language-Independent Named Entity Recogniser and Classifier

Jon Patrick; Casey Whitelaw; Robert Munro

The Sydney Language Independent Named Entity Recogniser and Classifier (SLINERC) is a multi-stage system for the recognition and classification of named entities. Each stage uses a decision graph learner to combine statistical features with results from prior stages. Earlier stages are focused upon entity recognition, the division of non-entity terms from entities. Later stages concentrate on the classification of these entities into the desired classes. The best over-all f-values are 73.92 and 71.36 for the Spanish and Dutch datasets, respectively.

Journal of the American Medical Informatics Association | 2014

Supervised machine learning and active learning in classification of radiology reports

Dung H M Nguyen; Jon Patrick

OBJECTIVE This paper presents an automated system for classifying the results of imaging examinations (CT, MRI, positron emission tomography) into reportable and non-reportable cancer cases. This system is part of an industrial-strength processing pipeline built to extract content from radiology reports for use in the Victorian Cancer Registry. MATERIALS AND METHODS In addition to traditional supervised learning methods such as conditional random fields and support vector machines, active learning (AL) approaches were investigated to optimize training production and further improve classification performance. The project involved two pilot sites in Victoria, Australia (Lake Imaging (Ballarat) and Peter MacCallum Cancer Centre (Melbourne)) and, in collaboration with the NSW Central Registry, one pilot site at Westmead Hospital (Sydney). RESULTS The reportability classifier performance achieved 98.25% sensitivity and 96.14% specificity on the cancer registrys held-out test set. Up to 92% of training data needed for supervised machine learning can be saved by AL. DISCUSSION AL is a promising method for optimizing the supervised training production used in classification of radiology reports. When an AL strategy is applied during the data selection process, the cost of manual classification can be reduced significantly. CONCLUSIONS The most important practical application of the reportability classifier is that it can dramatically reduce human effort in identifying relevant reports from the large imaging pool for further investigation of cancer. The classifier is built on a large real-world dataset and can achieve high performance in filtering relevant reports to support cancer registries.

north american chapter of the association for computational linguistics | 2003

Named entity recognition using a character-based probabilistic approach

Casey Whitelaw; Jon Patrick

We present a named entity recognition and classification system that uses only probabilistic character-level features. Classifications by multiple orthographic tries are combined in a hidden Markov model framework to incorporate both internal and contextual evidence. As part of the system, we perform a preprocessing stage in which capitalisation is restored to sentence-initial and all-caps words with high accuracy. We report f-values of 86.65 and 79.78 for English, and 50.62 and 54.43 for the German datasets.

meeting of the association for computational linguistics | 2007

Developing Feature Types for Classifying Clinical Notes

Jon Patrick; Yitao Zhang; Yefeng Wang

This paper proposes a machine learning approach to the task of assigning the international standard on classification of diseases ICD-9-CM codes to clinical records. By treating the task as a text categorisation problem, a classification system was built which explores a variety of features including negation, different strategies of measuring gloss overlaps between the content of clinical records and ICD-9-CM code descriptions together with expansion of the glosses from the ICD-9-CM hierarchy. The best classifier achieved an overall F1 value of 88.2 on a data set of 978 free text clinical records, and was better than the performance of two out of three human annotators.

conference on computational natural language learning | 2001

Boosted decision graphs for NLP learning tasks

Jon Patrick; Ishaan Goyal

This paper reports the implementation of DRAPH-GP an extension of the decision graph algorithm DGRAPH-OW using the AdaBoost algorithm. This algorithm, which we call 1-Stage Boosting, is shown to improve the accuracy of decision graphs, along with another technique which we combine with AdaBoost and call 2-Stage Boosting which shows greater improvement. Empirical tests demonstrate that both 1-Stage and 2-Stage Boosting techniques perform better than the boosted C4.5 algorithm (C5.0). The boosting has shown itself competitive for NLP tasks with a high disjunction of attribute space against memory based methods, and potentially better if part of an Hierarchical Multi-Method Classifier. An explanation for the effectiveness of boosting due to a poor choice of prior probabilities is presented. 1. INTRODUCTION In a wide variety of classification problems, boosting techniques have proven to be an effective method to significantly reduce the error of weak learning algorithms. While the AdaBoost algorithm (Freund & Schapire, 1995) has been used to improve the accuracy of a decision tree algorithm (Quinlan & Rivest, 1989), which uses the Minimum Description Length Principle (MDL), little is known about its effectiveness on the decision graphs.

Explore More