Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where George Karystianis is active.

Publication


Featured researches published by George Karystianis.


eLife | 2016

Bias in the reporting of sex and age in biomedical research on mouse models

Oscar Flórez-Vargas; Andy Brass; George Karystianis; Michael Bramhall; Robert Stevens; Sheena M. Cruickshank; Goran Nenadic

In animal-based biomedical research, both the sex and the age of the animals studied affect disease phenotypes by modifying their susceptibility, presentation and response to treatment. The accurate reporting of experimental methods and materials, including the sex and age of animals, is essential so that other researchers can build on the results of such studies. Here we use text mining to study 15,311 research papers in which mice were the focus of the study. We find that the percentage of papers reporting the sex and age of mice has increased over the past two decades: however, only about 50% of the papers published in 2014 reported these two variables. We also compared the quality of reporting in six preclinical research areas and found evidence for different levels of sex-bias in these areas: the strongest male-bias was observed in cardiovascular disease models and the strongest female-bias was found in infectious disease models. These results demonstrate the ability of text mining to contribute to the ongoing debate about the reproducibility of research, and confirm the need to continue efforts to improve the reporting of experimental methods and materials. DOI: http://dx.doi.org/10.7554/eLife.13615.001


Journal of Biomedical Informatics | 2015

Combining knowledge- and data-driven methods for de-identification of clinical narratives

Azad Dehghan; Aleksandar Kovačević; George Karystianis; John A. Keane; Goran Nenadic

A recent promise to access unstructured clinical data from electronic health records on large-scale has revitalized the interest in automated de-identification of clinical notes, which includes the identification of mentions of Protected Health Information (PHI). We describe the methods developed and evaluated as part of the i2b2/UTHealth 2014 challenge to identify PHI defined by 25 entity types in longitudinal clinical narratives. Our approach combines knowledge-driven (dictionaries and rules) and data-driven (machine learning) methods with a large range of features to address de-identification of specific named entities. In addition, we have devised a two-pass recognition approach that creates a patient-specific run-time dictionary from the PHI entities identified in the first step with high confidence, which is then used in the second pass to identify mentions that lack specific clues. The proposed method achieved the overall micro F1-measures of 91% on strict and 95% on token-level evaluation on the test dataset (514 narratives). Whilst most PHI entities can be reliably identified, particularly challenging were mentions of Organizations and Professions. Still, the overall results suggest that automated text mining methods can be used to reliably process clinical notes to identify personal information and thus providing a crucial step in large-scale de-identification of unstructured data for further clinical and epidemiological studies.


Journal of Biomedical Informatics | 2015

Using local lexicalized rules to identify heart disease risk factors in clinical notes

George Karystianis; Azad Dehghan; Aleksandar Kovačević; John A. Keane; Goran Nenadic

Heart disease is the leading cause of death globally and a significant part of the human population lives with it. A number of risk factors have been recognized as contributing to the disease, including obesity, coronary artery disease (CAD), hypertension, hyperlipidemia, diabetes, smoking, and family history of premature CAD. This paper describes and evaluates a methodology to extract mentions of such risk factors from diabetic clinical notes, which was a task of the i2b2/UTHealth 2014 Challenge in Natural Language Processing for Clinical Data. The methodology is knowledge-driven and the system implements local lexicalized rules (based on syntactical patterns observed in notes) combined with manually constructed dictionaries that characterize the domain. A part of the task was also to detect the time interval in which the risk factors were present in a patient. The system was applied to an evaluation set of 514 unseen notes and achieved a micro-average F-score of 88% (with 86% precision and 90% recall). While the identification of CAD family history, medication and some of the related disease factors (e.g. hypertension, diabetes, hyperlipidemia) showed quite good results, the identification of CAD-specific indicators proved to be more challenging (F-score of 74%). Overall, the results are encouraging and suggested that automated text mining methods can be used to process clinical notes to identify risk factors and monitor progression of heart disease on a large-scale, providing necessary data for clinical and epidemiological studies.


PLOS Biology | 2017

Increasing efficiency of preclinical research by group sequential designs

Konrad Neumann; Ulrike Grittner; Sophie K. Piper; Andre Rex; Oscar Flórez-Vargas; George Karystianis; Alice Schneider; Ian Wellwood; Bob Siegerink; John P. A. Ioannidis; Jonathan Kimmelman; Ulrich Dirnagl

Despite the potential benefits of sequential designs, studies evaluating treatments or experimental manipulations in preclinical experimental biomedicine almost exclusively use classical block designs. Our aim with this article is to bring the existing methodology of group sequential designs to the attention of researchers in the preclinical field and to clearly illustrate its potential utility. Group sequential designs can offer higher efficiency than traditional methods and are increasingly used in clinical trials. Using simulation of data, we demonstrate that group sequential designs have the potential to improve the efficiency of experimental studies, even when sample sizes are very small, as is currently prevalent in preclinical experimental biomedicine. When simulating data with a large effect size of d = 1 and a sample size of n = 18 per group, sequential frequentist analysis consumes in the long run only around 80% of the planned number of experimental units. In larger trials (n = 36 per group), additional stopping rules for futility lead to the saving of resources of up to 30% compared to block designs. We argue that these savings should be invested to increase sample sizes and hence power, since the currently underpowered experiments in preclinical biomedicine are a major threat to the value and predictiveness in this research domain.


Journal of Biomedical Semantics | 2014

Mining characteristics of epidemiological studies from Medline: a case study in obesity

George Karystianis; Iain Buchan; Goran Nenadic

BackgroundThe health sciences literature incorporates a relatively large subset of epidemiological studies that focus on population-level findings, including various determinants, outcomes and correlations. Extracting structured information about those characteristics would be useful for more complete understanding of diseases and for meta-analyses and systematic reviews.ResultsWe present an information extraction approach that enables users to identify key characteristics of epidemiological studies from MEDLINE abstracts. It extracts six types of epidemiological characteristic: design of the study, population that has been studied, exposure, outcome, covariates and effect size. We have developed a generic rule-based approach that has been designed according to semantic patterns observed in text, and tested it in the domain of obesity. Identified exposure, outcome and covariate concepts are clustered into health-related groups of interest. On a manually annotated test corpus of 60 epidemiological abstracts, the system achieved precision, recall and F-score between 79-100%, 80-100% and 82-96% respectively. We report the results of applying the method to a large scale epidemiological corpus related to obesity.ConclusionsThe experiments suggest that the proposed approach could identify key epidemiological characteristics associated with a complex clinical problem from related abstracts. When integrated over the literature, the extracted data can be used to provide a more complete picture of epidemiological efforts, and thus support understanding via meta-analysis and systematic reviews.


Journal of Biomedical Informatics | 2017

Evaluation of a rule-based method for epidemiological document classification towards the automation of systematic reviews

George Karystianis; Kristina A. Thayer; Mary S. Wolfe; Guy Tsafnat

INTRODUCTION Most data extraction efforts in epidemiology are focused on obtaining targeted information from clinical trials. In contrast, limited research has been conducted on the identification of information from observational studies, a major source for human evidence in many fields, including environmental health. The recognition of key epidemiological information (e.g., exposures) through text mining techniques can assist in the automation of systematic reviews and other evidence summaries. METHOD We designed and applied a knowledge-driven, rule-based approach to identify targeted information (study design, participant population, exposure, outcome, confounding factors, and the country where the study was conducted) from abstracts of epidemiological studies included in several systematic reviews of environmental health exposures. The rules were based on common syntactical patterns observed in text and are thus not specific to any systematic review. To validate the general applicability of our approach, we compared the data extracted using our approach versus hand curation for 35 epidemiological study abstracts manually selected for inclusion in two systematic reviews. RESULTS The returned F-score, precision, and recall ranged from 70% to 98%, 81% to 100%, and 54% to 97%, respectively. The highest precision was observed for exposure, outcome and population (100%) while recall was best for exposure and study design with 97% and 89%, respectively. The lowest recall was observed for the population (54%), which also had the lowest F-score (70%). CONCLUSION The generated performance of our text-mining approach demonstrated encouraging results for the identification of targeted information from observational epidemiological study abstracts related to environmental exposures. We have demonstrated that rules based on generic syntactic patterns in one corpus can be applied to other observational study design by simple interchanging the dictionaries aiming to identify certain characteristics (i.e., outcomes, exposures). At the document level, the recognised information can assist in the selection and categorization of studies included in a systematic review.


Journal of Biomedical Informatics | 2017

Learning to identify Protected Health Information by integrating knowledge- and data-driven algorithms: A case study on psychiatric evaluation notes.

Azad Dehghan; Aleksandar Kovačević; George Karystianis; John A. Keane; Goran Nenadic

De-identification of clinical narratives is one of the main obstacles to making healthcare free text available for research. In this paper we describe our experience in expanding and tailoring two existing tools as part of the 2016 CEGS N-GRID Shared Tasks Track 1, which evaluated de-identification methods on a set of psychiatric evaluation notes for up to 25 different types of Protected Health Information (PHI). The methods we used rely on machine learning on either a large or small feature space, with additional strategies, including two-pass tagging and multi-class models, which both proved to be beneficial. The results show that the integration of the proposed methods can identify Health Information Portability and Accountability Act (HIPAA) defined PHIs with overall F1-scores of ∼90% and above. Yet, some classes (Profession, Organization) proved again to be challenging given the variability of expressions used to reference given information.


International Journal of Methods in Psychiatric Research | 2018

Automatic Mining of Symptom Severity from Psychiatric Evaluation Notes

George Karystianis; Alejo J. Nevado; Chi-Hun Kim; Azad Dehghan; John A. Keane; Goran Nenadic

As electronic mental health records become more widely available, several approaches have been suggested to automatically extract information from free‐text narrative aiming to support epidemiological research and clinical decision‐making. In this paper, we explore extraction of explicit mentions of symptom severity from initial psychiatric evaluation records. We use the data provided by the 2016 CEGS N‐GRID NLP shared task Track 2, which contains 541 records manually annotated for symptom severity according to the Research Domain Criteria.


Systematic Reviews | 2018

Automated screening of research studies for systematic reviews using study characteristics

Guy Tsafnat; Paul Glasziou; George Karystianis; Enrico Coiera

BackgroundScreening candidate studies for inclusion in a systematic review is time-consuming when conducted manually. Automation tools could reduce the human effort devoted to screening. Existing methods use supervised machine learning which train classifiers to identify relevant words in the abstracts of candidate articles that have previously been labelled by a human reviewer for inclusion or exclusion. Such classifiers typically reduce the number of abstracts requiring manual screening by about 50%.MethodsWe extracted four key characteristics of observational studies (population, exposure, confounders and outcomes) from the text of titles and abstracts for all articles retrieved using search strategies from systematic reviews. Our screening method excluded studies if they did not meet a predefined set of characteristics. The method was evaluated using three systematic reviews. Screening results were compared to the actual inclusion list of the reviews.ResultsThe best screening threshold rule identified studies that mentioned both exposure (E) and outcome (O) in the study abstract. This screening rule excluded 93.7% of retrieved studies with a recall of 98%.ConclusionsFiltering studies for inclusion in a systematic review based on the detection of key study characteristics in abstracts significantly outperformed standard approaches to automated screening and appears worthy of further development and evaluation.


Journal of Medical Internet Research | 2018

Automatic Extraction of Mental Health Disorders From Domestic Violence Police Narratives: Text Mining Study

George Karystianis; Armita Adily; Peter W. Schofield; Lee Knight; Clara Galdon; David Greenberg; Louisa Jorm; Goran Nenadic; Tony Butler

Background Vast numbers of domestic violence (DV) incidents are attended by the New South Wales Police Force each year in New South Wales and recorded as both structured quantitative data and unstructured free text in the WebCOPS (Web-based interface for the Computerised Operational Policing System) database regarding the details of the incident, the victim, and person of interest (POI). Although the structured data are used for reporting purposes, the free text remains untapped for DV reporting and surveillance purposes. Objective In this paper, we explore whether text mining can automatically identify mental health disorders from this unstructured text. Methods We used a training set of 200 DV recorded events to design a knowledge-driven approach based on lexical patterns in text suggesting mental health disorders for POIs and victims. Results The precision returned from an evaluation set of 100 DV events was 97.5% and 87.1% for mental health disorders related to POIs and victims, respectively. After applying our approach to a large-scale corpus of almost a half million DV events, we identified 77,995 events (15.83%) that mentioned mental health disorders, with 76.96% (60,032/77,995) of those linked to POIs versus 16.47% (12,852/77,995) for the victims and 6.55% (5111/77,995) for both. Depression was the most common mental health disorder mentioned in both victims (22.25%, 3269) and POIs (18.70%, 8944), followed by alcohol abuse for POIs (12.19%, 5829) and various anxiety disorders (eg, panic disorder, generalized anxiety disorder) for victims (11.66%, 1714). Conclusions The results suggest that text mining can automatically extract targeted information from police-recorded DV events to support further public health research into the nexus between mental health disorders and DV.

Collaboration


Dive into the George Karystianis's collaboration.

Top Co-Authors

Avatar

Goran Nenadic

University of Manchester

View shared research outputs
Top Co-Authors

Avatar

Azad Dehghan

University of Manchester

View shared research outputs
Top Co-Authors

Avatar

John A. Keane

University of Manchester

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andy Brass

University of Manchester

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Iain Buchan

University of Manchester

View shared research outputs
Researchain Logo
Decentralizing Knowledge