Sofie Van Landeghem
Ghent University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sofie Van Landeghem.
PLOS ONE | 2013
Sofie Van Landeghem; Jari Björne; Chih Hsuan Wei; Kai Hakala; Sampo Pyysalo; Sophia Ananiadou; Hung Yu Kao; Zhiyong Lu; Tapio Salakoski; Yves Van de Peer; Filip Ginter
Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families. To this end, we have combined two state-of-the-art text mining components, previously evaluated on two community-wide challenges, and have extended and improved upon these methods by exploiting their complementary nature. Using these systems, we perform normalization and event extraction to create a large-scale resource that is publicly available, unique in semantic scope, and covers all 21.9 million PubMed abstracts and 460 thousand PubMed Central open access full-text articles. This dataset contains 40 million biomolecular events involving 76 million gene/protein mentions, linked to 122 thousand distinct genes from 5032 species across the full taxonomic tree. Detailed evaluations and analyses reveal promising results for application of this data in database and pathway curation efforts. The main software components used in this study are released under an open-source license. Further, the resulting dataset is freely accessible through a novel API, providing programmatic and customized access (http://www.evexdb.org/api/v001/). Finally, to allow for large-scale bioinformatic analyses, the entire resource is available for bulk download from http://evexdb.org/download/, under the Creative Commons – Attribution – Share Alike (CC BY-SA) license.
Plant Physiology | 2014
Hannes Claeys; Sofie Van Landeghem; Marieke Dubois; Katrien Maleux; Dirk Inzé
Responses to abiotic stress strongly depend on the stress level, and novel parameters, such as shoot growth inhibition and marker genes, are needed to accurately study and quantify mild stress responses. In vitro stress assays are commonly used to study the responses of plants to abiotic stress and to assess stress tolerance. A literature review reveals that most studies use very high stress levels and measure criteria such as germination, plant survival, or the development of visual symptoms such as bleaching. However, we show that these parameters are indicators of very severe stress, and such studies thus only provide incomplete information about stress sensitivity in Arabidopsis (Arabidopsis thaliana). Similarly, transcript analysis revealed that typical stress markers are only induced at high stress levels in young seedlings. Therefore, tools are needed to study the effects of mild stress. We found that the commonly used stress-inducing agents mannitol, sorbitol, NaCl, and hydrogen peroxide impact shoot growth in a highly specific and dose-dependent way. Therefore, shoot growth is a sensitive, relevant, and easily measured phenotype to assess stress tolerance over a wide range of stress levels. Finally, our data suggest that care should be taken when using mannitol as an osmoticum.
Environmental Microbiology | 2013
Klaas Vandepoele; Michiel Van Bel; Guilhem Richard; Sofie Van Landeghem; Bram Verhelst; Hervé Moreau; Yves Van de Peer; Nigel Grimsley; Gwenael Piganeau
With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains.
Advances in Bioinformatics | 2012
Sofie Van Landeghem; Kai Hakala; Samuel Rönnqvist; Tapio Salakoski; Yves Van de Peer; Filip Ginter
Technological advancements in the field of genetics have led not only to an abundance of experimental data, but also caused an exponential increase of the number of published biomolecular studies. Text mining is widely accepted as a promising technique to help researchers in the life sciences deal with the amount of available literature. This paper presents a freely available web application built on top of 21.3 million detailed biomolecular events extracted from all PubMed abstracts. These text mining results were generated by a state-of-the-art event extraction system and enriched with gene family associations and abstract generalizations, accounting for lexical variants and synonymy. The EVEX resource locates relevant literature on phosphorylation, regulation targets, binding partners, and several other biomolecular events and assigns confidence values to these events. The search function accepts official gene/protein symbols as well as common names from all species. Finally, the web application is a powerful tool for generating homology-based hypotheses as well as novel, indirect associations between genes and proteins such as coregulators.
The Plant Cell | 2013
Sofie Van Landeghem; Stefanie De Bodt; Zuzanna Drebert; Dirk Inzé; Yves Van de Peer
Manual evaluation of state-of-the art text mining data reveals promising results for its application in plant network biology. Focusing on Arabidopsis thaliana, an integrated network of text mining and experimental data highlights the complementarity of these resources and the necessity for text mining tools to uncover the latest relevant findings from the literature. Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein–protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies.
BMC Bioinformatics | 2016
Sofie Van Landeghem; Thomas Van Parys; Marieke Dubois; Dirk Inzé; Yves Van de Peer
BackgroundDifferential networks have recently been introduced as a powerful way to study the dynamic rewiring capabilities of an interactome in response to changing environmental conditions or stimuli. Currently, such differential networks are generated and visualised using ad hoc methods, and are often limited to the analysis of only one condition-specific response or one interaction type at a time.ResultsIn this work, we present a generic, ontology-driven framework to infer, visualise and analyse an arbitrary set of condition-specific responses against one reference network. To this end, we have implemented novel ontology-based algorithms that can process highly heterogeneous networks, accounting for both physical interactions and regulatory associations, symmetric and directed edges, edge weights and negation. We propose this integrative framework as a standardised methodology that allows a unified view on differential networks and promotes comparability between differential network studies. As an illustrative application, we demonstrate its usefulness on a plant abiotic stress study and we experimentally confirmed a predicted regulator.AvailabilityDiffany is freely available as open-source java library and Cytoscape plugin from http://bioinformatics.psb.ugent.be/supplementary_data/solan/diffany/.
BMC Bioinformatics | 2011
Yoshinobu Kano; Jari Björne; Filip Ginter; Tapio Salakoski; Ekaterina Buyko; Udo Hahn; K. Bretonnel Cohen; Karin Verspoor; Christophe Roeder; Lawrence Hunter; Halil Kilicoglu; Sabine Bergler; Sofie Van Landeghem; Thomas Van Parys; Yves Van de Peer; Makoto Miwa; Sophia Ananiadou; Mariana Neves; Alberto Pascual-Montano; Arzucan Özgür; Dragomir R. Radev; Sebastian Riedel; Rune Sætre; Hong-Woo Chun; Jin-Dong Kim; Sampo Pyysalo; Tomoko Ohta; Jun’ichi Tsujii
BACKGROUND Bio-molecular event extraction from literature is recognized as an important task of bio text mining and, as such, many relevant systems have been developed and made available during the last decade. While such systems provide useful services individually, there is a need for a meta-service to enable comparison and ensemble of such services, offering optimal solutions for various purposes. RESULTS We have integrated nine event extraction systems in the U-Compare framework, making them intercompatible and interoperable with other U-Compare components. The U-Compare event meta-service provides various meta-level features for comparison and ensemble of multiple event extraction systems. Experimental results show that the performance improvements achieved by the ensemble are significant. CONCLUSIONS While individual event extraction systems themselves provide useful features for bio text mining, the U-Compare meta-service is expected to improve the accessibility to the individual systems, and to enable meta-level uses over multiple event extraction systems such as comparison and ensemble.
BMC Bioinformatics | 2010
Thomas Abeel; Sofie Van Landeghem; Roser Morante; Vincent Van Asch; Yves Van de Peer; Walter Daelemans; Yvan Saeys
This meeting report gives an overview of the keynote lectures, the panel discussion and a selection of the contributed presentations. The workshop was held in Gent, Belgium on May 10-11. It featured a tutorial aimed towards a broad audience of (computational) biologists, (computational) linguists and researchers working purely on text mining.
computational intelligence | 2011
Sofie Van Landeghem; Bernard De Baets; Yves Van de Peer; Yvan Saeys
We have developed a machine learning framework to accurately extract complex genetic interactions from text. Employing type‐specific classifiers, this framework processes research articles to extract various biological events. Subsequently, the algorithm identifies regulation events that take other events as arguments, allowing a nested structure of predictions. All predictions are merged into an integrated network, useful for visualization and for deduction of new biological knowledge.
Bioinformatics | 2015
Suwisa Kaewphan; Sofie Van Landeghem; Tomoko Ohta; Yves Van de Peer; Filip Ginter; Sampo Pyysalo
Motivation: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain. In support of this task, we introduce two text collections manually annotated for cell line names: the broad-coverage corpus Gellus and CLL, a focused target domain corpus. Results: We find that the best performance is achieved using NERsuite, a machine learning system based on Conditional Random Fields, trained on the Gellus corpus and supported with a dictionary of cell line names. The system achieves an F-score of 88.46% on the test set of Gellus and 85.98% on the independently annotated CLL corpus. It was further applied at large scale to 24 302 102 unannotated articles, resulting in the identification of 5 181 342 cell line mentions, normalized to 11 755 unique cell line database identifiers. Availability and implementation: The manually annotated datasets, the cell line dictionary, derived corpora, NERsuite models and the results of the large-scale run on unannotated texts are available under open licenses at http://turkunlp.github.io/Cell-line-recognition/. Contact: [email protected]