George Gkotsis
King's College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by George Gkotsis.
web science | 2014
George Gkotsis; Karen Stepanyan; Carlos Pedrinaci; John Domingue; Maria Liakata
This paper addresses the problem of determining the best answer in Community-based Question Answering websites by focussing on the content. Previous research on this topic relies on the exploitation of community feedback on the answers, which involves rating of either users (e.g., reputation) or answers (e.g. scores manually assigned to answers). We propose a new technique that leverages the content/textual features of answers in a novel way. Our approach delivers better results than related linguistics-based solutions and manages to match rating-based approaches. More specifically, the gain in performance is achieved by rendering the values of these features into a discretised form. We also show how our technique manages to deliver equally good results in real-time settings, as opposed to having to rely on information not always readily available, such as user ratings and answer scores. We ran an evaluation on 21 StackExchange websites covering around 4 million questions and more than 8 million answers. We obtain 84% average precision and 70% recall, which shows that our technique is robust, effective, and widely applicable.
north american chapter of the association for computational linguistics | 2016
George Gkotsis; Anika Oellrich; Tim Hubbard; Richard Dobson; Maria Liakata; Sumithra Velupillai; Rina Dutta
Online social media, such as Reddit, has become an important resource to share personal experiences and communicate with others. Among other personal information, some social media users communicate about mental health problems they are experiencing, with the intention of getting advice, support or empathy from other users. Here, we investigate the language of Reddit posts specific to mental health, to define linguistic characteristics that could be helpful for further applications. The latter include attempting to identify posts that need urgent attention due to their nature, e.g. when someone announces their intentions of ending their life by suicide or harming others. Our results show that there are a variety of linguistic features that are discriminative across mental health user communities and that can be further exploited in subsequent classification tasks. Furthermore, while negative sentiment is almost uniformly expressed across the entire data set, we demonstrate that there are also condition-specific vocabularies used in social media to communicate about particular disorders. Source code and related materials are available from: https: //github.com/gkotsis/ reddit-mental-health.
north american chapter of the association for computational linguistics | 2016
George Gkotsis; Sumithra Velupillai; Anika Oellrich; Harry Dean; Maria Liakata; Rina Dutta
Mental Health Records (MHRs) contain freetext documentation about patients’ suicide and suicidality. In this paper, we address the problem of determining whether grammatic variants (inflections) of the word “suicide” are affirmed or negated. To achieve this, we populate and annotate a dataset with over 6,000 sentences originating from a large repository of MHRs. The resulting dataset has high InterAnnotator Agreement ( 0.93). Furthermore, we develop and propose a negation detection method that leverages syntactic features of text 1 . Using parse trees, we build a set of basic rules that rely on minimum domain knowledge and render the problem as binary classification (affirmed vs. negated). Since the overall goal is to identify patients who are expected to be at high risk of suicide, we focus on the evaluation of positive (affirmed) cases as determined by our classifier. Our negation detection approach yields a recall (sensitivity) value of 94.6% for the positive cases and an overall accuracy value of 91.9%. We believe that our approach can be integrated with other clinical Natural Language Processing tools in order to further advance information extraction capabilities.
Scientific Reports | 2017
George Gkotsis; Anika Oellrich; Sumithra Velupillai; Maria Liakata; Tim Hubbard; Richard Dobson; Rina Dutta
The number of people affected by mental illness is on the increase and with it the burden on health and social care use, as well as the loss of both productivity and quality-adjusted life-years. Natural language processing of electronic health records is increasingly used to study mental health conditions and risk behaviours on a large scale. However, narrative notes written by clinicians do not capture first-hand the patients’ own experiences, and only record cross-sectional, professional impressions at the point of care. Social media platforms have become a source of ‘in the moment’ daily exchange, with topics including well-being and mental health. In this study, we analysed posts from the social media platform Reddit and developed classifiers to recognise and classify posts related to mental illness according to 11 disorder themes. Using a neural network and deep learning approach, we could automatically recognise mental illness-related posts in our balenced dataset with an accuracy of 91.08% and select the correct theme with a weighted average accuracy of 71.37%. We believe that these results are a first step in developing methods to characterise large amounts of user-generated content that could support content curation and targeted interventions.
web science | 2015
George Gkotsis; Maria Liakata; Carlos Pedrinaci; Karen Stepanyan; John Domingue
This paper addresses the problem of determining the best answer in Community-based Question Answering (CQA) websites by focussing on the content. In particular, we present a novel system, ACQUA (http://acqua.kmi.open.ac.uk), that can be installed onto the majority of browsers as a plugin. The service offers a seamless and accurate prediction of the answer to be accepted. Our system is based on a novel approach for processing answers in CQAs. Previous research on this topic relies on the exploitation of community feedback on the answers, which involves rating of either users (e.g., reputation) or answers (e.g. scores manually assigned to answers). We propose a new technique that leverages the content/textual features of answers in a novel way. Our approach delivers better results than related linguistics-based solutions and manages to match rating-based approaches. More specifically, the gain in performance is achieved by rendering the values of these features into a discretised form. We also show how our technique manages to deliver equally good results in real-time settings, as opposed to having to rely on information not always readily available, such as user ratings and answer scores. We ran an evaluation on 21 StackExchange websites covering around 4 million questions and more than 8 million answers. We obtain 84% average precision and 70% recall, which shows that our technique is robust, effective, and widely applicable.
international semantic web conference | 2016
Allan Third; George Gkotsis; Eleni Kaldoudi; George Drosatos; Nick Portokallidis; Stefanos Roumeliotis; Kalliopi Pafili; John Domingue
The assessment of risk in medicine is a crucial task, and depends on scientific knowledge derived by systematic clinical studies on factors affecting health, as well as on particular knowledge about the current status of a particular patient. Existing non-semantic risk prediction tools are typically based on hardcoded scientific knowledge, and only cover a very limited range of patient states. This makes them rapidly out of date, and limited in application, particularly for patients with multiple co-occurring conditions. In this work we propose an integration of Semantic Web and Quantified Self technologies to create a framework for calculating clinical risk predictions for patients based on self-gathered biometric data. This framework relies on generic, reusable ontologies for representing clinical risk, and sensor readings, and reasoning to support the integration of data represented according to these ontologies. The implemented framework shows a wide range of advantages over existing risk calculation.
F1000Research | 2018
Richard Jackson; Rashmi Patel; Sumithra Velupillai; George Gkotsis; David Hoyle; Robert Stewart
Background: Deep Phenotyping is the precise and comprehensive analysis of phenotypic features in which the individual components of the phenotype are observed and described. In UK mental health clinical practice, most clinically relevant information is recorded as free text in the Electronic Health Record, and offers a granularity of information beyond what is expressed in most medical knowledge bases. The SNOMED CT nomenclature potentially offers the means to model such information at scale, yet given a sufficiently large body of clinical text collected over many years, it is difficult to identify the language that clinicians favour to express concepts. Methods: By utilising a large corpus of healthcare data, we sought to make use of semantic modelling and clustering techniques to represent the relationship between the clinical vocabulary of internationally recognised SMI symptoms and the preferred language used by clinicians within a care setting. We explore how such models can be used for discovering novel vocabulary relevant to the task of phenotyping Serious Mental Illness (SMI) with only a small amount of prior knowledge. Results: 20 403 terms were derived and curated via a two stage methodology. The list was reduced to 557 putative concepts based on eliminating redundant information content. These were then organised into 9 distinct categories pertaining to different aspects of psychiatric assessment. 235 concepts were found to be expressions of putative clinical significance. Of these, 53 were identified having novel synonymy with existing SNOMED CT concepts. 106 had no mapping to SNOMED CT. Conclusions: We demonstrate a scalable approach to discovering new concepts of SMI symptomatology based on real-world clinical observation. Such approaches may offer the opportunity to consider broader manifestations of SMI symptomatology than is typically assessed via current diagnostic frameworks, and create the potential for enhancing nomenclatures such as SNOMED CT based on real-world expressions.
Scientific Reports | 2017
George Gkotsis; Anika Oellrich; Sumithra Velupillai; Maria Liakata; Tim Hubbard; Richard Dobson; Rina Dutta
Scientific Reports 7: Article number: 45141; published online: 22 March 2017; updated: 16 May 2017 One of the Supplementary Information files that accompany this study was inadvertently omitted in the original version of this Article. The link to this Supplementary Information file has now been added to the HTML version of the Article.
Archive | 2015
Allan Third; Eleni Kaldoudi; George Gkotsis; Stefanos Roumeliotis; Kalliope Pafili; John Domingue
European Psychiatry | 2016
Anna Kolliakou; Michael Ball; Leon Derczynski; David Chandran; George Gkotsis; Paolo Deluca; Richard Jackson; Hitesh Shetty; Robert Stewart