Roxana Danger
Imperial College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Roxana Danger.
international provenance and annotation workshop | 2014
Paolo Missier; Jeremy Bryans; Carl Gamble; Vasa Curcin; Roxana Danger
Provenance metadata can be valuable in data sharing settings, where it can be used to help data consumers form judgements regarding the reliability of the data produced by third parties. However, some parts of provenance may be sensitive, requiring access control, or they may need to be simplified for the intended audience. Both these issues can be addressed by a single mechanism for creating abstractions over provenance, coupled with a policy model to drive the abstraction. Such mechanism, which we refer to as abstraction by grouping, simultaneously achieves partial disclosure of provenance, and facilitates its consumption. In this paper we introduce a formal foundation for this type of abstraction, grounded in the W3C PROV model; describe the associated policy model; and briefly present its implementation, the ProvAbs tool for interactive experimentation with policies and abstractions.
Future Generation Computer Systems | 2014
Vasa Curcin; Simon Miles; Roxana Danger; Yuhui Chen; Richard Bache; Adel Taweel
The provenance of a piece of data refers to knowledge about its origin, in terms of the entities and actors involved in its creation, e.g. data sources used, operations carried out on them, and users enacting those operations. Provenance is used to better understand the data and the context of its production, and to assess its reliability, by asserting whether correct procedures were followed. Providing evidence for validating research is of particular importance in the biomedical domain, where the strength of the results depends on the data sources and processes used. In recent times, previously manual processes have become fully or semi-automated, e.g. clinical trial recruitment, epidemiological studies, diagnosis making. The latter is typically achieved through interactions of heterogeneous software systems in multiple settings (hospitals, clinics, academic and industrial research organisations). Provenance traces of these software need to be integrated in a consistent and meaningful manner, but since these software systems rarely share a common platform, the provenance interoperability between them has to be achieved on the level of conceptual models. It is a non-trivial matter to determine where to start in making a biomedical software system provenance-aware. In this paper, we specify recommendations to developers on how to approach provenance modelling, capture, security, storage and querying, based on our experiences with two large-scale biomedical research projects: Translational Research and Patient Safety in Europe (TRANSFoRm) and Electronic Health Records for Clinical Research (EHR4CR). While illustrated with concrete issues encountered, the recommendations are of a sufficiently high level so as to be reusable across the biomedical domain.
Journal of Algorithms | 2009
Roxana Danger; Rafael Berlanga
A novel Information Extraction system able to generate complex instances from free texts available on the Web is presented in this paper. The approach is based on non monotonical processing over ontologies, and makes use of entity recognizers and disambiguators in order to adequately extract and combine instances and relations between them. Experiments conducted over the archaeological research domain provide satisfactory results and suggest that the tool is suitable for its application on Semantic Web resources.
BMC Family Practice | 2015
Jean Karl Soler; Derek Corrigan; Przemyslaw Kazienko; Tomasz Kajdanowicz; Roxana Danger; Marcin Kulisiewicz; Brendan Delaney
BackgroundAnalysis of encounter data relevant to the diagnostic process sourced from routine electronic medical record (EMR) databases represents a classic example of the concept of a learning healthcare system (LHS). By collecting International Classification of Primary Care (ICPC) coded EMR data as part of the Transition Project from Dutch and Maltese databases (using the EMR TransHIS), data mining algorithms can empirically quantify the relationships of all presenting reasons for encounter (RfEs) and recorded diagnostic outcomes. We have specifically looked at new episodes of care (EoC) for two urinary system infections: simple urinary tract infection (UTI, ICPC code: U71) and pyelonephritis (ICPC code: U70).MethodsParticipating family doctors (FDs) recorded details of all their patient contacts in an EoC structure using the ICPC, including RfEs presented by the patient, and the FDs’ diagnostic labels. The relationships between RfEs and episode titles were studied using probabilistic and data mining methods as part of the TRANSFoRm project.ResultsThe Dutch data indicated that the presence of RfE’s “Cystitis/Urinary Tract Infection”, “Dysuria”, “Fear of UTI”, “Urinary frequency/urgency”, “Haematuria”, “Urine symptom/complaint, other” are all strong, reliable, predictors for the diagnosis “Cystitis/Urinary Tract Infection” . The Maltese data indicated that the presence of RfE’s “Dysuria”, “Urinary frequency/urgency”, “Haematuria” are all strong, reliable, predictors for the diagnosis “Cystitis/Urinary Tract Infection”.The Dutch data indicated that the presence of RfE’s “Flank/axilla symptom/complaint”, “Dysuria”, “Fever”, “Cystitis/Urinary Tract Infection”, “Abdominal pain/cramps general” are all strong, reliable, predictors for the diagnosis “Pyelonephritis” . The Maltese data set did not present any clinically and statistically significant predictors for pyelonephritis.ConclusionsWe describe clinically and statistically significant diagnostic associations observed between UTIs and pyelonephritis presenting as a new problem in family practice, and all associated RfEs, and demonstrate that the significant diagnostic cues obtained are consistent with the literature. We conclude that it is possible to generate clinically meaningful diagnostic evidence from electronic sources of patient data.
Future Generation Computer Systems | 2015
Roxana Danger; Vasa Curcin; Paolo Missier; Jeremy Bryans
Data provenance refers to the knowledge about data sources and operations carried out to obtain some piece of data. A provenance-enabled system maintains record of the interoperation of processes across different modules, stages and authorities to capture the full lineage of the resulting data, and typically allows data-focused audits using semantic technologies, such as ontologies, that capture domain knowledge. However, regulating access to captured provenance data is a non-trivial problem, since execution records form complex, overlapping graphs with individual nodes possibly being subject to different access policies. Applying traditional access control to provenance queries can either hide from the user the entire graph with nodes that had access to them denied, reveal too much information, or return a semantically invalid graph. An alternative approach is to answer queries with a new graph that abstracts over the missing nodes and fragments. In this paper, we present TACLP, an access control language for provenance data that supports this approach, together with an algorithm that transforms graphs according to sets of access restrictions. The algorithm produces safe and valid provenance graphs that retain the maximum amount of information allowed by the security model. The approach is demonstrated on an example of restricting access to a clinical trial provenance trace.
Journal of Biomedical Informatics | 2010
Roxana Danger; Isabel Segura-Bedmar; Paloma Martínez; Paolo Rosso
Important progress in treating diseases has been possible thanks to the identification of drug targets. Drug targets are the molecular structures whose abnormal activity, associated to a disease, can be modified by drugs, improving the health of patients. Pharmaceutical industry needs to give priority to their identification and validation in order to reduce the long and costly drug development times. In the last two decades, our knowledge about drugs, their mechanisms of action and drug targets has rapidly increased. Nevertheless, most of this knowledge is hidden in millions of medical articles and textbooks. Extracting knowledge from this large amount of unstructured information is a laborious job, even for human experts. Drug target articles identification, a crucial first step toward the automatic extraction of information from texts, constitutes the aim of this paper. A comparison of several machine learning techniques has been performed in order to obtain a satisfactory classifier for detecting drug target articles using semantic information from biomedical resources such as the Unified Medical Language System. The best result has been achieved by a Fuzzy Lattice Reasoning classifier, which reaches 98% of ROC area measure.
Journal of Biomedical Informatics | 2017
Vasa Curcin; Elliot Fairweather; Roxana Danger; Derek Corrigan
Decision support systems are used as a method of promoting consistent guideline-based diagnosis supporting clinical reasoning at point of care. However, despite the availability of numerous commercial products, the wider acceptance of these systems has been hampered by concerns about diagnostic performance and a perceived lack of transparency in the process of generating clinical recommendations. This resonates with the Learning Health System paradigm that promotes data-driven medicine relying on routine data capture and transformation, which also stresses the need for trust in an evidence-based system. Data provenance is a way of automatically capturing the trace of a research task and its resulting data, thereby facilitating trust and the principles of reproducible research. While computational domains have started to embrace this technology through provenance-enabled execution middlewares, traditionally non-computational disciplines, such as medical research, that do not rely on a single software platform, are still struggling with its adoption. In order to address these issues, we introduce provenance templates - abstract provenance fragments representing meaningful domain actions. Templates can be used to generate a model-driven service interface for domain software tools to routinely capture the provenance of their data and tasks. This paper specifies the requirements for a Decision Support tool based on the Learning Health System, introduces the theoretical model for provenance templates and demonstrates the resulting architecture. Our methods were tested and validated on the provenance infrastructure for a Diagnostic Decision Support System that was developed as part of the EU FP7 TRANSFoRm project.
database and expert systems applications | 2004
Roxana Danger; Rafael Berlanga; José Rui’z-Shulcloper
Currently, the main drawback for the development of the Semantic Web stems from the manual tagging of web pages according to a given ontology that conceptualizes its domain. This tasks is usually hard, even for experts, and it is prone to errors due to the different interpretations users can have about the same documents. In this paper we address the problem of automatically gene rating ontology instances starting from a collection of unstructured documents (e.g. plain texts, HTML pages, etc.). These instances will populate the Semantic Web that is described by the ontology. The proposed approach combines Information Extraction tec hniques, mainly entity recognition, information merging and Text Mining techniques. This approach has been successfully applied in the development of a Semantic Web for the Archaeology Research.
Archive | 2013
Vasa Curcin; Roxana Danger; Wolfgang Kuchinke; Simon Miles; Adel Taweel; Christian Ohmann
This chapter proposes a provenance model for the clinical research domain, focusing on the planning and conduct of randomized controlled trials, and the subsequent analysis and reporting of results from those trials. We look at the provenance requirements for clinical research and trial management of different stakeholders (researchers, clinicians, participants, IT staff) to identify elements needed at multiple levels and stages of the process. In order to address these challenges, a provenance model is defined by extending the Open Provenance Model with domain-specific additions that tie the representation closer to the expertise of medical users, and with the ultimate aim of creating the first OPM profile for randomized controlled clinical trials. As a starting point, we used the domain information model developed at University of Dusseldorf, which conforms to the ICH Guideline for Good Clinical Practice (GCP) standard, thereby ensuring the wider applicability of our work. The application of the model is demonstrated on several examples and queries based on the integrated trial data being captured as part of the TRANSFoRm EU FP7 project.
Archive | 2007
Roxana Danger; Rafael Berlanga
New data warehouse tools for Semantic Web are becoming more and more necessary. The present paper formalizes one such a tool considering, on the one hand, the semantics and theorical foundations of Description Logic and, on the other hand, the current developments of information data generalization. The presented model is constituted by dimensions and multidimensional schemata and spaces. An algorithm to retrieve interesting spaces according to the data distribution is also proposed. Some ideas from Data Mining techniques are included in order to allow users to discover knowledge from the Semantic Web.