Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Thomas Wächter is active.

Publication


Featured researches published by Thomas Wächter.


Briefings in Bioinformatics | 2008

Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?

Rainer Winnenburg; Thomas Wächter; Conrad Plake; Andreas Doms; Michael Schroeder

The biomedical literature can be seen as a large integrated, but unstructured data repository. Extracting facts from literature and making them accessible is approached from two directions: manual curation efforts develop ontologies and vocabularies to annotate gene products based on statements in papers. Text mining aims to automatically identify entities and their relationships in text using information retrieval and natural language processing techniques. Manual curation is highly accurate but time consuming, and does not scale with the ever increasing growth of literature. Text mining as a high-throughput computational technique scales well, but is error-prone due to the complexity of natural language. How can both be married to combine scalability and accuracy? Here, we review the state-of-the-art text mining approaches that are relevant to annotation and discuss available online services analysing biomedical literature by means of text mining techniques, which could also be utilised by annotation projects. We then examine how far text mining has already been utilised in existing annotation projects and conclude how these techniques could be tightly integrated into the manual annotation process through novel authoring systems to scale-up high-quality manual curation.


Bioinformatics | 2010

Semi-automated ontology generation within OBO-Edit

Thomas Wächter; Michael Schroeder

Motivation: Ontologies and taxonomies have proven highly beneficial for biocuration. The Open Biomedical Ontology (OBO) Foundry alone lists over 90 ontologies mainly built with OBO-Edit. Creating and maintaining such ontologies is a labour-intensive, difficult, manual process. Automating parts of it is of great importance for the further development of ontologies and for biocuration. Results: We have developed the Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG), a system which supports the creation and extension of OBO ontologies by semi-automatically generating terms, definitions and parent–child relations from text in PubMed, the web and PDF repositories. DOG4DAG is seamlessly integrated into OBO-Edit. It generates terms by identifying statistically significant noun phrases in text. For definitions and parent–child relations it employs pattern-based web searches. We systematically evaluate each generation step using manually validated benchmarks. The term generation leads to high-quality terms also found in manually created ontologies. Up to 78% of definitions are valid and up to 54% of child–ancestor relations can be retrieved. There is no other validated system that achieves comparable results. By combining the prediction of high-quality terms, definitions and parent–child relations with the ontology editor OBO-Edit we contribute a thoroughly validated tool for all OBO ontology engineers. Availability: DOG4DAG is available within OBO-Edit 2.1 at http://www.oboedit.org Contact: [email protected]; Supplementary Information: Supplementary data are available at Bioinformatics online.


dagstuhl seminar proceedings | 2009

GoPubMed: Exploring PubMed with Ontological Background Knowledge

Heiko Dietze; Dimitra Alexopoulou; Michael R. Alvers; Liliana Barrio-Alvers; Bill Andreopoulos; Andreas Doms; Jörg Hakenberg; Jan Mönnich; Conrad Plake; Andreas Reischuck; Loı̈c Royer; Thomas Wächter; Matthias Zschunke; Michael Schroeder

With the ever increasing size of scientific literature, finding relevant documents and answering questions has become even more of a challenge. Recently, ontologies—hierarchical, controlled vocabularies—have been introduced to annotate genomic data. They can also improve the question and answering and the selection of relevant documents in the literature search. Search engines such as GoPubMed.org use ontological background knowledge to give an overview over large query results and to answer questions. We review the problems and solutions underlying these next-generation intelligent search engines and give examples of the power of this new search paradigm.


BMC Bioinformatics | 2008

Terminologies for text-mining; an experiment in the lipoprotein metabolism domain

Dimitra Alexopoulou; Thomas Wächter; Laura Pickersgill; Cecilia Eyre; Michael Schroeder

BackgroundThe engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them.ResultsWe share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition (ATR) methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods.ConclusionsGiven high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking ATR results as input and following the guidelines we described.AvailabilityThe TFIDF term recognition is available as Web Service, described at http://gopubmed4.biotec.tu-dresden.de/IdavollWebService/services/CandidateTermGeneratorService?wsdl


semantic web applications and tools for life sciences | 2011

DOG4DAG: semi-automated ontology generation in OBO-Edit and Protégé

Thomas Wächter; Götz Fabian; Michael Schroeder

In the biomedical domain, Protégé and OBO-Edit are the main ontology editors supporting the manual construction of ontologies. Since manual creation is a laborious and hence costly process, there have been efforts to automate parts of this process. Here, we give a demo of the capabilities of DOG4DAG, the Dresden Ontology Generator for Directed Acyclic Graphs, which is available as plugin to both OBO-Edit and Protégé. In the demo, we describe how to generate terms and in particular siblings, definitions, and is-a relationships using an example in the domain of nervous system diseases. We summarise the strengths and limits of the different the steps of the generation process.


winter simulation conference | 2006

A corpus-driven approach for design, evolution and alignment of ontologies

Thomas Wächter; André Wobst; Michael Schroeder; He Tan; Patrick Lambrix

Bio-ontologies are hierarchical vocabularies, which are used to annotate other data sources such as sequence and structure databases. With the wide use of ontologies their integration, design, and evolution becomes an important problem. We show how textmining on relevant text corpora can be used to identify matching ontology terms of two separate ontologies and to propose new ontology terms for a given term. We evaluate these approaches on the GeneOntology


acm symposium on applied computing | 2006

Two-phase clustering strategy for gene expression data sets

Dirk Habich; Thomas Wächter; Wolfgang Lehner; Christian Pilarsky

In the context of genome research, the method of gene expression analysis has been used for several years. Related microarray experiments are conducted all over the world, and consequently, a vast amount of microarray data sets are produced. Having access to this variety of repositories, researchers would like to incorporate this data in their analyses to increase the statistical significance of their results. In this paper, we present a new two-phase clustering strategy which is based on the combination of local clustering results to obtain a global clustering. The advantage of such a technique is that each microarray data set can be normalized and clustered separately. The set of different relevant local clustering results is then used to calculate the global clustering result. Furthermore, we present an approach based on technical as well as biological quality measures to determine weighting factors for quantifying the local results proportion within the global result. The better the attested quality of the local results, the stronger their impact on the global result.


Bioinformatics | 2012

Extending ontologies by finding siblings using set expansion techniques

Götz Fabian; Thomas Wächter; Michael Schroeder

Motivation: Ontologies are an everyday tool in biomedicine to capture and represent knowledge. However, many ontologies lack a high degree of coverage in their domain and need to improve their overall quality and maturity. Automatically extending sets of existing terms will enable ontology engineers to systematically improve text-based ontologies level by level. Results: We developed an approach to extend ontologies by discovering new terms which are in a sibling relationship to existing terms of an ontology. For this purpose, we combined two approaches which retrieve new terms from the web. The first approach extracts siblings by exploiting the structure of HTML documents, whereas the second approach uses text mining techniques to extract siblings from unstructured text. Our evaluation against MeSH (Medical Subject Headings) shows that our method for sibling discovery is able to suggest first-class ontology terms and can be used as an initial step towards assessing the completeness of ontologies. The evaluation yields a recall of 80% at a precision of 61% where the two independent approaches are complementing each other. For MeSH in particular, we show that it can be considered complete in its medical focus area. We integrated the work into DOG4DAG, an ontology generation plugin for the editors OBO-Edit and Protégé, making it the first plugin that supports sibling discovery on-the-fly. Availability: Sibling discovery for ontology is available as part of DOG4DAG (www.biotec.tu-dresden.de/research/schroeder/dog4dag) for both Protégé 4.1 and OBO-Edit 2.1. Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Regulatory Toxicology and Pharmacology | 2011

A knowledge-based search engine to navigate the information thicket of nanotoxicology

Ursula G. Sauer; Carsten Kneuer; Jutta Tentschert; Thomas Wächter; Michael Schroeder; Daniel Butzke; Andreas Luch; Manfred Liebsch; Barbara Grune; Mario Götz

The risk assessment of nano-sized materials (NM) currently suffers from great uncertainties regarding their putative toxicity for humans and the environment. An extensive amount of the respective original research literature has to be evaluated before a targeted and hypothesis-driven Environmental and Health Safety research can be stipulated. Furthermore, to comply with the European animal protection legislation in vitro testing has to be preferred whenever possible. Against this background, there is the need for tools that enable producers of NM and risk assessors for a fast and comprehensive data retrieval, thereby linking the 3Rs principle to the hazard identification of NM. Here we report on the development of a knowledge-based search engine that is tailored to the particular needs of risk assessors in the area of NM. Comprehensive retrieval of data from studies utilising in vitro as well as in vivo methods relying on the PubMed database is presented exemplarily with a titanium dioxide case study. A fast, relevant and reliable information retrieval is of paramount importance for the scientific community dedicated to develop safe NM in various product areas, and for risk assessors obliged to identify data gaps, to define additional data requirements for approval of NM and to create strategies for integrated testing using alternative methods.


Lecture Notes in Computer Science | 2006

Ontologies and Text Mining as a Basis for a Semantic Web for the Life Sciences

Andreas Doms; Vaida Jakoniené; Patrick Lambrix; Michael Schroeder; Thomas Wächter

The life sciences are a promising application area for semantic web technologies as there are large online structured and unstructured data repositories and ontologies, which structure this knowledge. We briefly give an overview over biomedical ontologies and show how they can help to locate, retrieve, and integrate biomedical data. Annotating literature with ontology terms is an important problem to support such ontology-based searches. We review the steps involved in this text mining task and introduce the ontology-based search engine GoPubMed. As the underlying data sources evolve, so do the ontologies. We give a brief overview over different approaches supporting the semi-automatic evolution of ontologies.

Collaboration


Dive into the Thomas Wächter's collaboration.

Top Co-Authors

Avatar

Michael Schroeder

Dresden University of Technology

View shared research outputs
Top Co-Authors

Avatar

Andreas Doms

Dresden University of Technology

View shared research outputs
Top Co-Authors

Avatar

Dimitra Alexopoulou

Dresden University of Technology

View shared research outputs
Top Co-Authors

Avatar

Michael R. Alvers

Dresden University of Technology

View shared research outputs
Top Co-Authors

Avatar

Heiko Dietze

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andreas Luch

Federal Institute for Risk Assessment

View shared research outputs
Top Co-Authors

Avatar

Barbara Grune

Federal Institute for Risk Assessment

View shared research outputs
Top Co-Authors

Avatar

Christian Pilarsky

Dresden University of Technology

View shared research outputs
Top Co-Authors

Avatar

Conrad Plake

Dresden University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge