Honghan Wu
King's College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Honghan Wu.
Springer International Publishing | 2017
Jeff Z. Pan; Guido Vetere; José Manuél Gómez-Pérez; Honghan Wu
This book addresses the topic of exploiting enterprise-linked data with a particularfocus on knowledge construction and accessibility within enterprises. It identifies thegaps between the requirements of enterprise knowledge consumption and standarddata consuming technologies by analysing real-world use cases, and proposes theenterprise knowledge graph to fill such gaps. It provides concrete guidelines for effectively deploying linked-data graphs withinand across business organizations. It is divided into three parts, focusing on the keytechnologies for constructing, understanding and employing knowledge graphs. Part 1 introduces basic background information and technologies, and presents asimple architecture to elucidate the main phases and tasks required during the lifecycleof knowledge graphs. Part 2 focuses on technical aspects; it starts with state-of-theart knowledge-graph construction approaches, and then discusses exploration andexploitation techniques as well as advanced question-answering topics concerningknowledge graphs. Lastly, Part 3 demonstrates examples of successful knowledgegraph applications in the media industry, healthcare and cultural heritage, and offersconclusions and future visions.
PLOS ONE | 2017
Ehtesham Iqbal; Robbie Mallah; Daniel Rhodes; Honghan Wu; Alvin Romero; Nynn Chang; Olubanke Dzahini; Chandra Pandey; Matthew Broadbent; Robert Stewart; Richard Dobson; Zina M. Ibrahim; Tudor Groza
Adverse drug events (ADEs) are unintended responses to medical treatment. They can greatly affect a patient’s quality of life and present a substantial burden on healthcare. Although Electronic health records (EHRs) document a wealth of information relating to ADEs, they are frequently stored in the unstructured or semi-structured free-text narrative requiring Natural Language Processing (NLP) techniques to mine the relevant information. Here we present a rule-based ADE detection and classification pipeline built and tested on a large Psychiatric corpus comprising 264k patients using the de-identified EHRs of four UK-based psychiatric hospitals. The pipeline uses characteristics specific to Psychiatric EHRs to guide the annotation process, and distinguishes: a) the temporal value associated with the ADE mention (whether it is historical or present), b) the categorical value of the ADE (whether it is assertive, hypothetical, retrospective or a general discussion) and c) the implicit contextual value where the status of the ADE is deduced from surrounding indicators, rather than explicitly stated. We manually created the rulebase in collaboration with clinicians and pharmacists by studying ADE mentions in various types of clinical notes. We evaluated the open-source Adverse Drug Event annotation Pipeline (ADEPt) using 19 ADEs specific to antipsychotics and antidepressants medication. The ADEs chosen vary in severity, regularity and persistence. The average F-measure and accuracy achieved by our tool across all tested ADEs were 0.83 and 0.83 respectively. In addition to annotation power, the ADEPT pipeline presents an improvement to the state of the art context-discerning algorithm, ConText.
BMC Medical Informatics and Decision Making | 2018
Richard Jackson; Ismail Kartoglu; Clive Stringer; Genevieve Gorrell; Angus Roberts; Xingyi Song; Honghan Wu; Asha Agrawal; Kenneth Lui; Tudor Groza; Damian Lewsley; Doug Northwood; Amos Folarin; Robert Stewart; Richard Dobson
BackgroundTraditional health information systems are generally devised to support clinical data collection at the point of care. However, as the significance of the modern information economy expands in scope and permeates the healthcare domain, there is an increasing urgency for healthcare organisations to offer information systems that address the expectations of clinicians, researchers and the business intelligence community alike. Amongst other emergent requirements, the principal unmet need might be defined as the 3R principle (right data, right place, right time) to address deficiencies in organisational data flow while retaining the strict information governance policies that apply within the UK National Health Service (NHS). Here, we describe our work on creating and deploying a low cost structured and unstructured information retrieval and extraction architecture within King’s College Hospital, the management of governance concerns and the associated use cases and cost saving opportunities that such components present.ResultsTo date, our CogStack architecture has processed over 300 million lines of clinical data, making it available for internal service improvement projects at King’s College London. On generated data designed to simulate real world clinical text, our de-identification algorithm achieved up to 94% precision and up to 96% recall.ConclusionWe describe a toolkit which we feel is of huge value to the UK (and beyond) healthcare community. It is the only open source, easily deployable solution designed for the UK healthcare environment, in a landscape populated by expensive proprietary systems. Solutions such as these provide a crucial foundation for the genomic revolution in medicine.
Exploiting Linked Data and Knowledge Graphs in Large Organisations | 2017
Alessandro Moschitti; Kateryna Tymoshenko; Panos Alexopoulos; Andrew D. Walker; Massimo Nicosia; Guido Vetere; Alessandro Faraotti; Marco Monti; Jeff Z. Pan; Honghan Wu; Yuting Zhao
In the Digital and Information Age, companies and government agencies are highly digitalized, as the information exchanges happening in their processes. They store information both as natural language text and structured data, e.g., relational databases or knowledge graphs. In this scenario, methods for organizing, finding, and selecting relevant information, beyond the capabilities of classic Information Retrieval, are always active topics of research and development.
Database | 2017
Honghan Wu; Anika Oellrich; Christine Girges; Bernard de Bono; Tim Hubbard; Richard Dobson
Abstract Neurodegenerative disorders such as Parkinson’s and Alzheimer’s disease are devastating and costly illnesses, a source of major global burden. In order to provide successful interventions for patients and reduce costs, both causes and pathological processes need to be understood. The ApiNATOMY project aims to contribute to our understanding of neurodegenerative disorders by manually curating and abstracting data from the vast body of literature amassed on these illnesses. As curation is labour-intensive, we aimed to speed up the process by automatically highlighting those parts of the PDF document of primary importance to the curator. Using techniques similar to those of summarisation, we developed an algorithm that relies on linguistic, semantic and spatial features. Employing this algorithm on a test set manually corrected for tool imprecision, we achieved a macro F1-measure of 0.51, which is an increase of 132% compared to the best bag-of-words baseline model. A user based evaluation was also conducted to assess the usefulness of the methodology on 40 unseen publications, which reveals that in 85% of cases all highlighted sentences are relevant to the curation task and in about 65% of the cases, the highlights are sufficient to support the knowledge curation task without needing to consult the full text. In conclusion, we believe that these are promising results for a step in automating the recognition of curation-relevant sentences. Refining our approach to pre-digest papers will lead to faster processing and cost reduction in the curation process. Database URL: https://github.com/KHP-Informatics/NapEasy
Scientific Reports | 2018
Daniel M. Bean; Honghan Wu; Ehtesham Iqbal; Olubanke Dzahini; Zina M. Ibrahim; Matthew Broadbent; Robert Stewart; Richard Dobson
A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has been fixed in the paper.
Journal of the American Medical Informatics Association | 2018
Honghan Wu; Giulia Toti; Katherine I. Morley; Zina M. Ibrahim; Amos Folarin; Richard Jackson; Ismail Kartoglu; Asha Agrawal; Clive Stringer; Darren Gale; Genevieve Gorrell; Angus Roberts; Matthew Broadbent; Robert Stewart; Richard Dobson
Abstract Objective Unlocking the data contained within both structured and unstructured components of electronic health records (EHRs) has the potential to provide a step change in data available for secondary research use, generation of actionable medical insights, hospital management, and trial recruitment. To achieve this, we implemented SemEHR, an open source semantic search and analytics tool for EHRs. Methods SemEHR implements a generic information extraction (IE) and retrieval infrastructure by identifying contextualized mentions of a wide range of biomedical concepts within EHRs. Natural language processing annotations are further assembled at the patient level and extended with EHR-specific knowledge to generate a timeline for each patient. The semantic data are serviced via ontology-based search and analytics interfaces. Results SemEHR has been deployed at a number of UK hospitals, including the Clinical Record Interactive Search, an anonymized replica of the EHR of the UK South London and Maudsley National Health Service Foundation Trust, one of Europe’s largest providers of mental health services. In 2 Clinical Record Interactive Search–based studies, SemEHR achieved 93% (hepatitis C) and 99% (HIV) F-measure results in identifying true positive patients. At King’s College Hospital in London, as part of the CogStack program (github.com/cogstack), SemEHR is being used to recruit patients into the UK Department of Health 100 000 Genomes Project (genomicsengland.co.uk). The validation study suggests that the tool can validate previously recruited cases and is very fast at searching phenotypes; time for recruitment criteria checking was reduced from days to minutes. Validated on open intensive care EHR data, Medical Information Mart for Intensive Care III, the vital signs extracted by SemEHR can achieve around 97% accuracy. Conclusion Results from the multiple case studies demonstrate SemEHR’s efficiency: weeks or months of work can be done within hours or minutes in some cases. SemEHR provides a more comprehensive view of patients, bringing in more and unexpected insight compared to study-oriented bespoke IE systems. SemEHR is open source, available at https://github.com/CogStack/SemEHR.
international conference on digital health | 2017
Chandra Pandey; Zina M. Ibrahim; Honghan Wu; Ehtesham Iqbal; Richard Dobson
Electronic Health Records (EHR) narratives are a rich source of information, embedding high-resolution information of value to secondary research use. However, because the EHRs are mostly in natural language free-text and highly ambiguity-ridden, many natural language processing algorithms have been devised around them to extract meaningful structured information about clinical entities. The performance of the algorithms however, largely varies depending on the training dataset as well as the effectiveness of the use of background knowledge to steer the learning process. In this paper we study the impact of initializing the training of a neural network natural language processing algorithm with pre-defined clinical word embeddings to improve feature extraction and relationship classification between entities. We add our embedding framework to a bi-directional long short-term memory (Bi-LSTM) neural network, and further study the effect of using attention weights in neural networks for sequence labelling tasks to extract knowledge of Adverse Drug Reactions (ADRs). We incorporate unsupervised word embeddings using Word2Vec and GloVe from widely available medical resources such as Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II corpora, Unified Medical Language System (UMLS) as well as embed pharmaco lexicon from available EHRs. Our algorithm, implemented using two datasets, shows that our architecture outperforms baseline Bi-LSTM or Bi-LSTM networks using linear chain and Skip-Chain conditional random fields (CRF).
Scientific Reports | 2017
Daniel M. Bean; Honghan Wu; Ehtesham Iqbal; Olubanke Dzahini; Zina M. Ibrahim; Matthew Broadbent; Robert Stewart; Richard Dobson
Unknown adverse reactions to drugs available on the market present a significant health risk and limit accurate judgement of the cost/benefit trade-off for medications. Machine learning has the potential to predict unknown adverse reactions from current knowledge. We constructed a knowledge graph containing four types of node: drugs, protein targets, indications and adverse reactions. Using this graph, we developed a machine learning algorithm based on a simple enrichment test and first demonstrated this method performs extremely well at classifying known causes of adverse reactions (AUC 0.92). A cross validation scheme in which 10% of drug-adverse reaction edges were systematically deleted per fold showed that the method correctly predicts 68% of the deleted edges on average. Next, a subset of adverse reactions that could be reliably detected in anonymised electronic health records from South London and Maudsley NHS Foundation Trust were used to validate predictions from the model that are not currently known in public databases. High-confidence predictions were validated in electronic records significantly more frequently than random models, and outperformed standard methods (logistic regression, decision trees and support vector machines). This approach has the potential to improve patient safety by predicting adverse reactions that were not observed during randomised trials.
Exploiting Linked Data and Knowledge Graphs in Large Organisations | 2017
Ronald Denaux; Yuan Ren; Boris Villazon-Terrazas; Panos Alexopoulos; Alessandro Faraotti; Honghan Wu
In this chapter, we prepare you a high-level overview of what is needed in order to create, maintain, and exploit knowledge graphs for a real application.