Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kristina M. Hettne is active.

Publication


Featured researches published by Kristina M. Hettne.


Bioinformatics | 2009

A dictionary to identify small molecules and drugs in free text

Kristina M. Hettne; R.H. Stierum; Martijn J. Schuemie; Peter J. M. Hendriksen; Bob J. A. Schijvenaars; Erik M. van Mulligen; Jos Kleinjans; Jan A. Kors

MOTIVATION From the scientific community, a lot of effort has been spent on the correct identification of gene and protein names in text, while less effort has been spent on the correct identification of chemical names. Dictionary-based term identification has the power to recognize the diverse representation of chemical information in the literature and map the chemicals to their database identifiers. RESULTS We developed a dictionary for the identification of small molecules and drugs in text, combining information from UMLS, MeSH, ChEBI, DrugBank, KEGG, HMDB and ChemIDplus. Rule-based term filtering, manual check of highly frequent terms and disambiguation rules were applied. We tested the combined dictionary and the dictionaries derived from the individual resources on an annotated corpus, and conclude the following: (i) each of the different processing steps increase precision with a minor loss of recall; (ii) the overall performance of the combined dictionary is acceptable (precision 0.67, recall 0.40 (0.80 for trivial names); (iii) the combined dictionary performed better than the dictionary in the chemical recognizer OSCAR3; (iv) the performance of a dictionary based on ChemIDplus alone is comparable to the performance of the combined dictionary. AVAILABILITY The combined dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web site http://www.biosemantics.org/chemlist.


international conference on e-science | 2012

Why workflows break — Understanding and combating decay in Taverna workflows

Jun Zhao; José Manuél Gómez-Pérez; Khalid Belhajjame; Graham Klyne; Esteban García-Cuesta; Aleix Garrido; Kristina M. Hettne; Marco Roos; David De Roure; Carole A. Goble

Workflows provide a popular means for preserving scientific methods by explicitly encoding their process. However, some of them are subject to a decay in their ability to be re-executed or reproduce the same results over time, largely due to the volatility of the resources required for workflow executions. This paper provides an analysis of the root causes of workflow decay based on an empirical study of a collection of Taverna workflows from the myExperiment repository. Although our analysis was based on a specific type of workflow, the outcomes and methodology should be applicable to workflows from other systems, at least those whose executions also rely largely on accessing third-party resources. Based on our understanding about decay we recommend a minimal set of auxiliary resources to be preserved together with the workflows as an aggregation object and provide a software tool for end-users to create such aggregations and to assess their completeness.


Journal of Web Semantics | 2015

Using a suite of ontologies for preserving workflow-centric research objects

Khalid Belhajjame; Jun Zhao; Daniel Garijo; Matthew Gamble; Kristina M. Hettne; Raúl Palma; Eleni Mina; Oscar Corcho; José Manuél Gómez-Pérez; Sean Bechhofer; Graham Klyne; Carole A. Goble

Scientific workflows are a popular mechanism for specifying and automating data-driven in silico experiments. A significant aspect of their value lies in their potential to be reused. Once shared, workflows become useful building blocks that can be combined or modified for developing new experiments. However, previous studies have shown that storing workflow specifications alone is not sufficient to ensure that they can be successfully reused, without being able to understand what the workflows aim to achieve or to re-enact them. To gain an understanding of the workflow, and how it may be used and repurposed for their needs, scientists require access to additional resources such as annotations describing the workflow, datasets used and produced by the workflow, and provenance traces recording workflow executions.In this article, we present a novel approach to the preservation of scientific workflows through the application of research objects-aggregations of data and metadata that enrich the workflow specifications. Our approach is realised as a suite of ontologies that support the creation of workflow-centric research objects. Their design was guided by requirements elicited from previous empirical analyses of workflow decay and repair. The ontologies developed make use of and extend existing well known ontologies, namely the Object Reuse and Exchange (ORE) vocabulary, the Annotation Ontology (AO) and the W3C PROV ontology (PROVO). We illustrate the application of the ontologies for building Workflow Research Objects with a case-study that investigates Huntingtons disease, performed in collaboration with a team from the Leiden University Medial Centre (HG-LUMC). Finally we present a number of tools developed for creating and managing workflow-centric research objects.


Current Topics in Medicinal Chemistry | 2005

Chemical and Biological Profiling of an Annotated Compound Library Directed to the Nuclear Receptor Family

Montserrat Cases; Ricard Garcia-Serna; Kristina M. Hettne; Marc Weeber; Johan van der Lei; Scott Boyer; Jordi Mestres

Nuclear receptors form a family of ligand-activated transcription factors that regulate a wide variety of biological processes and are thus generally considered relevant targets in drug discovery. We have constructed an annotated compound library directed to nuclear receptors (NRacl) as a means for integrating the chemical and biological data being generated within this family. Special care has been put in the appropriate storage of annotations by using hierarchical classification schemes for both molecules and nuclear receptors, which takes the ability to extract knowledge from annotated compound libraries to another level. Analysis of NRacl has ultimately led to the identification of scaffolds with highly promiscuous nuclear receptor profiles and to the classification of nuclear receptor groups with similar scaffold promiscuity patterns. This information can be exploited in the design of probing libraries for deorphanization activities as well as for devising screening batteries to address selectivity issues.


Journal of Biomedical Discovery and Collaboration | 2007

Applied information retrieval and multidisciplinary research: new mechanistic hypotheses in complex regional pain syndrome.

Kristina M. Hettne; Marissa de Mos; Anke Gj de Bruijn; Marc Weeber; Scott Boyer; Erik M. van Mulligen; Montserrat Cases; Jordi Mestres; Johan van der Lei

BackgroundCollaborative efforts of physicians and basic scientists are often necessary in the investigation of complex disorders. Difficulties can arise, however, when large amounts of information need to reviewed. Advanced information retrieval can be beneficial in combining and reviewing data obtained from the various scientific fields. In this paper, a team of investigators with varying backgrounds has applied advanced information retrieval methods, in the form of text mining and entity relationship tools, to review the current literature, with the intention to generate new insights into the molecular mechanisms underlying a complex disorder. As an example of such a disorder the Complex Regional Pain Syndrome (CRPS) was chosen. CRPS is a painful and debilitating syndrome with a complex etiology that is still unraveled for a considerable part, resulting in suboptimal diagnosis and treatment.ResultsA text mining based approach combined with a simple network analysis identified Nuclear Factor kappa B (NFκB) as a possible central mediator in both the initiation and progression of CRPS.ConclusionThe result shows the added value of a multidisciplinary approach combined with information retrieval in hypothesis discovery in biomedical research. The new hypothesis, which was derived in silico, provides a framework for further mechanistic studies into the underlying molecular mechanisms of CRPS and requires evaluation in clinical and epidemiological studies.


Journal of Cheminformatics | 2010

Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining

Kristina M. Hettne; Antony J. Williams; Erik M. van Mulligen; Jos Kleinjans; Valery Tkachenko; Jan A. Kors

BackgroundPreviously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships.ResultsWe acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation.ConclusionsWe conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider.com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/chemlist.


Journal of Biomedical Semantics | 2010

Rewriting and suppressing UMLS terms for improved biomedical term identification

Kristina M. Hettne; Erik M. van Mulligen; Martijn J. Schuemie; Bob J. A. Schijvenaars; Jan A. Kors

BackgroundIdentification of terms is essential for biomedical text mining.. We concentrate here on the use of vocabularies for term identification, specifically the Unified Medical Language System (UMLS). To make the UMLS more suitable for biomedical text mining we implemented and evaluated nine term rewrite and eight term suppression rules. The rules rely on UMLS properties that have been identified in previous work by others, together with an additional set of new properties discovered by our group during our work with the UMLS. Our work complements the earlier work in that we measure the impact on the number of terms identified by the different rules on a MEDLINE corpus. The number of uniquely identified terms and their frequency in MEDLINE were computed before and after applying the rules. The 50 most frequently found terms together with a sample of 100 randomly selected terms were evaluated for every rule.ResultsFive of the nine rewrite rules were found to generate additional synonyms and spelling variants that correctly corresponded to the meaning of the original terms and seven out of the eight suppression rules were found to suppress only undesired terms. Using the five rewrite rules that passed our evaluation, we were able to identify 1,117,772 new occurrences of 14,784 rewritten terms in MEDLINE. Without the rewriting, we recognized 651,268 terms belonging to 397,414 concepts; with rewriting, we recognized 666,053 terms belonging to 410,823 concepts, which is an increase of 2.8% in the number of terms and an increase of 3.4% in the number of concepts recognized. Using the seven suppression rules, a total of 257,118 undesired terms were suppressed in the UMLS, notably decreasing its size. 7,397 terms were suppressed in the corpus.ConclusionsWe recommend applying the five rewrite rules and seven suppression rules that passed our evaluation when the UMLS is to be used for biomedical term identification in MEDLINE. A software tool to apply these rules to the UMLS is freely available at http://biosemantics.org/casper.


Briefings in Bioinformatics | 2011

Literature-aided interpretation of gene expression data with the weighted global test

Rob Jelier; Jelle J. Goeman; Kristina M. Hettne; Martijn J. Schuemie; Johan T. den Dunnen; Peter A. C. 't Hoen

Most methods for the interpretation of gene expression profiling experiments rely on the categorization of genes, as provided by the Gene Ontology (GO) and pathway databases. Due to the manual curation process, such databases are never up-to-date and tend to be limited in focus and coverage. Automated literature mining tools provide an attractive, alternative approach. We review how they can be employed for the interpretation of gene expression profiling experiments. We illustrate that their comprehensive scope aids the interpretation of data from domains poorly covered by GO or alternative databases, and allows for the linking of gene expression with diseases, drugs, tissues and other types of concepts. A framework for proper statistical evaluation of the associations between gene expression values and literature concepts was lacking and is now implemented in a weighted extension of global test. The weights are the literature association scores and reflect the importance of a gene for the concept of interest. In a direct comparison with classical GO-based gene sets, we show that use of literature-based associations results in the identification of much more specific GO categories. We demonstrate the possibilities for linking of gene expression data to patient survival in breast cancer and the action and metabolism of drugs. Coupling with online literature mining tools ensures transparency and allows further study of the identified associations. Literature mining tools are therefore powerful additions to the toolbox for the interpretation of high-throughput genomics data.


Journal of Biomedical Semantics | 2014

Structuring research methods and data with the research object model: genomics workflows as a case study.

Kristina M. Hettne; Harish Dharuri; Jun Zhao; Katherine Wolstencroft; Khalid Belhajjame; Stian Soiland-Reyes; Eleni Mina; Mark Thompson; Don C. Cruickshank; L. Verdes-Montenegro; Julián Garrido; David De Roure; Oscar Corcho; Graham Klyne; Reinout van Schouwen; Peter A. C. 't Hoen; Sean Bechhofer; Carole A. Goble; Marco Roos

BackgroundOne of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows.ResultsWe present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as “which particular data was input to a particular workflow to test a particular hypothesis?”, and “which particular conclusions were drawn from a particular workflow?”.ConclusionsApplying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well.AvailabilityThe Research Object is available at http://www.myexperiment.org/packs/428The Wf4Ever Research Object Model is available at http://wf4ever.github.io/ro


BMC Medical Genomics | 2013

Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

Kristina M. Hettne; André Boorsma; Dorien A.M. van Dartel; Jelle J. Goeman; Esther de Jong; Aldert H. Piersma; Rob Stierum; Jos Kleinjans; Jan A. Kors

BackgroundAvailability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles.MethodsWe created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals.ResultsNext-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals.ConclusionsGene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect.

Collaboration


Dive into the Kristina M. Hettne's collaboration.

Top Co-Authors

Avatar

Marco Roos

Leiden University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Eleni Mina

Leiden University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Jan A. Kors

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Mark Thompson

Leiden University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Peter A. C. 't Hoen

Leiden University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Erik M. van Mulligen

Erasmus University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Rajaram Kaliyaperumal

Leiden University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Barend Mons

Leiden University Medical Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge