Georg Fette | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Georg Fette is active.

Explore More

Publication

Featured researches published by Georg Fette.

Natural Language Engineering | 2016

UIMA Ruta: Rapid development of rule-based information extraction applications

Peter Kluegl; Martin Toepfer; Philip-Daniel Beck; Georg Fette; Frank Puppe

Rule-based information extraction is an important approach for processing the increasingly available amount of unstructured data. The manual creation of rule-based applications is a time-consuming and tedious task, which requires qualified knowledge engineers. The costs of this process can be reduced by providing a suitable rule language and extensive tooling support. This paper presents UIMA Ruta, a tool for rule-based information extraction and text processing applications. The system was designed with focus on rapid development. The rule language and its matching paradigm facilitate the quick specification of comprehensible extraction knowledge. They support a compact representation while still providing a high level of expressiveness. These advantages are supplemented by the development environment UIMA Ruta Workbench. It provides, in addition to extensive editing support, essential assistance for explanation of rule execution, introspection, automatic validation, and rule induction. UIMA Ruta is a useful tool for academia and industry due to its open source license. We compare UIMA Ruta to related rule-based systems especially concerning the compactness of the rule representation, the expressiveness, and the provided tooling support. The competitiveness of the runtime performance is shown in relation to a popular and freely-available system. A selection of case studies implemented with UIMA Ruta illustrates the usefulness of the system in real-world scenarios.

international conference on artificial neural networks | 2005

Short term memory and pattern matching with simple echo state networks

Georg Fette; Julian Eggert

Two recently proposed approaches to recognize temporal patterns have been proposed by Jager with the so called Echo State Network (ESN) and by Maass with the so called Liquid State Machine (LSM). The ESN approach assumes a sort of “black-box” operability of the networks and claims a broad applicability to several different problems using the same principle. Here we propose a simplified version of ESNs which we call Simple Echo State Network (SESN) which exhibits good results in memory capacity and pattern matching tasks and which allows a better understanding of the capabilities and restrictions of ESNs.

BMC Medical Informatics and Decision Making | 2015

Fine-grained information extraction from German transthoracic echocardiography reports

Martin Toepfer; Hamo Corovic; Georg Fette; Peter Klügl; Stefan Störk; Frank Puppe

BackgroundInformation extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. Many languages and clinical domains, however, lack appropriate resources and tools, as well as evaluations of their applications, especially if detailed conceptualizations of the domain are required. For instance, German transthoracic echocardiography reports have not been targeted sufficiently before, despite of their importance for clinical trials. This work therefore aimed at development and evaluation of an information extraction component with a fine-grained terminology that enables to recognize almost all relevant information stated in German transthoracic echocardiography reports at the University Hospital of Würzburg.MethodsA domain expert validated and iteratively refined an automatically inferred base terminology. The terminology was used by an ontology-driven information extraction system that outputs attribute value pairs. The final component has been mapped to the central elements of a standardized terminology, and it has been evaluated according to documents with different layouts.ResultsThe final system achieved state-of-the-art precision (micro average.996) and recall (micro average.961) on 100 test documents that represent more than 90 % of all reports. In particular, principal aspects as defined in a standardized external terminology were recognized with f1=.989 (micro average) and f1=.963 (macro average). As a result of keyword matching and restraint concept extraction, the system obtained high precision also on unstructured or exceptionally short documents, and documents with uncommon layout.ConclusionsThe developed terminology and the proposed information extraction system allow to extract fine-grained information from German semi-structured transthoracic echocardiography reports with very high precision and high recall on the majority of documents at the University Hospital of Würzburg. Extracted results populate a clinical data warehouse which supports clinical research.

Methods of Information in Medicine | 2016

Data Linkage from Clinical to Study Databases via an R Data Warehouse User Interface. Experiences from a Large Clinical Follow-up Study.

Mathias Kaspar; Maximilian Ertl; Georg Fette; Georg Dietrich; M. Toepfer; C. Angermann; Stefan Störk; Frank Puppe

BACKGROUND Data that needs to be documented for clinical studies has often been acquired and documented in clinical routine. Usually this data is manually transferred to Case Report Forms (CRF) and/or directly into an electronic data capture (EDC) system. OBJECTIVES To enhance the documentation process of a large clinical follow-up study targeting patients admitted for acutely decompensated heart failure by accessing the data created during routine and study visits from a hospital information system (HIS) and by transferring it via a data warehouse (DWH) into the studys EDC system. METHODS This project is based on the clinical DWH developed at the University of Würzburg. The DWH was extended by several new data domains including data created by the study team itself. An R user interface was developed for the DWH that allows to access its source data in all its detail, to transform data as comprehensively as possible by R into study-specific variables and to support the creation of data and catalog tables. RESULTS A data flow was established that starts with labeling patients as study patients within the HIS and proceeds with updating the DWH with this label and further data domains at a daily rate. Several study-specific variables were defined using the implemented R user interface of the DWH. This system was then used to export these variables as data tables ready for import into our EDC system. The data tables were then used to initialize the first 296 patients within the EDC system by pseudonym, visit and data values. Afterwards, these records were filled with clinical data on heart failure, vital parameters and time spent on selected wards. CONCLUSIONS This solution focuses on the comprehensive access and transformation of data for a DWH-EDC system linkage. Using this system in a large clinical study has demonstrated the feasibility of this approach for a study with a complex visit schedule.

international conference on computational linguistics | 2014

Integrated Tools for Query-driven Development of Light-weight Ontologies and Information Extraction Components

Martin Toepfer; Georg Fette; Philip-Daniel Beck; Peter Kluegl; Frank Puppe

This paper reports on a user-friendly terminology and information extraction development environment that integrates into existing infrastructure for natural language processing and aims to close a gap in the UIMA community. The tool supports domain experts in data-driven and manual terminology refinement and refactoring. It can propose new concepts and simple relations and includes an information extraction algorithm that considers the context of terms for disambiguation. With its tight integration of easy-to-use and technical tools for component development and resource management, the system is especially designed to shorten times necessary for domain adaptation of such text processing components. Search support provided by the tool fosters this aspect and is helpful for building natural language processing modules in general. Specialized queries are included to speed up several tasks, for example, the detection of new terms and concepts, or simple quality estimation without gold standard documents. The development environment is modular and extensible by using Eclipse and the Apache UIMA framework. This paper describes the system’s architecture and features with a focus on search support. Notably, this paper proposes a generic middleware component for queries in a UIMA based workbench.

Clinical Research in Cardiology | 2018

Underestimated prevalence of heart failure in hospital inpatients: a comparison of ICD codes and discharge letter information

Mathias Kaspar; Georg Fette; Gülmisal Güder; Lea Seidlmayer; Maximilian Ertl; Georg Dietrich; Helmut Greger; Frank Puppe; Stefan Störk

BackgroundHeart failure is the predominant cause of hospitalization and amongst the leading causes of death in Germany. However, accurate estimates of prevalence and incidence are lacking. Reported figures originating from different information sources are compromised by factors like economic reasons or documentation quality.MethodsWe implemented a clinical data warehouse that integrates various information sources (structured parameters, plain text, data extracted by natural language processing) and enables reliable approximations to the real number of heart failure patients. Performance of ICD-based diagnosis in detecting heart failure was compared across the years 2000–2015 with (a) advanced definitions based on algorithms that integrate various sources of the hospital information system, and (b) a physician-based reference standard.ResultsApplying these methods for detecting heart failure in inpatients revealed that relying on ICD codes resulted in a marked underestimation of the true prevalence of heart failure, ranging from 44% in the validation dataset to 55% (single year) and 31% (all years) in the overall analysis. Percentages changed over the years, indicating secular changes in coding practice and efficiency. Performance was markedly improved using search and permutation algorithms from the initial expert-specified query (F1 score of 81%) to the computer-optimized query (F1 score of 86%) or, alternatively, optimizing precision or sensitivity depending on the search objective.ConclusionsEstimating prevalence of heart failure using ICD codes as the sole data source yielded unreliable results. Diagnostic accuracy was markedly improved using dedicated search algorithms. Our approach may be transferred to other hospital information systems.

Methods of Information in Medicine | 2018

Ad Hoc Information Extraction for Clinical Data Warehouses

Georg Dietrich; Jonathan Krebs; Georg Fette; Maximilian Ertl; Mathias Kaspar; Stefan Störk; Frank Puppe

Summary Background: Clinical Data Warehouses (CDW) reuse Electronic health records (EHR) to make their data retrievable for research purposes or patient recruitment for clinical trials. However, much information are hidden in unstructured data like discharge letters. They can be preprocessed and converted to structured data via information extraction (IE), which is unfortunately a laborious task and therefore usually not available for most of the text data in CDW. Objectives: The goal of our work is to provide an ad hoc IE service that allows users to query text data ad hoc in a manner similar to querying structured data in a CDW. While search engines just return text snippets, our systems also returns frequencies (e.g. how many patients exist with “heart failure” including textual synonyms or how many patients have an LVEF < 45) based on the content of discharge letters or textual reports for special investigations like heart echo. Three subtasks are addressed: (1) To recognize and to exclude negations and their scopes, (2) to extract concepts, i.e. Boolean values and (3) to extract numerical values. Methods: We implemented an extended version of the NegEx-algorithm for German texts that detects negations and determines their scope. Furthermore, our document oriented CDW PaDaWaN was extended with query functions, e.g. context sensitive queries and regex queries, and an extraction mode for computing the frequencies for Boolean and numerical values. Results: Evaluations in chest X-ray reports and in discharge letters showed high F1-scores for the three subtasks: Detection of negated concepts in chest X-ray reports with an F1-score of 0.99 and in discharge letters with 0.97; of Boolean values in chest X-ray reports about 0.99, and of numerical values in chest X-ray reports and discharge letters also around 0.99 with the exception of the concept age. Discussion: The advantages of an ad hoc IE over a standard IE are the low development effort (just entering the concept with its variants), the promptness of the results and the adaptability by the user to his or her particular question. Disadvantage are usually lower accuracy and confidence. This ad hoc information extraction approach is novel and exceeds existing systems: Roogle [ 1 ] extracts predefined concepts from texts at preprocessing and makes them retrievable at runtime. Dr. Warehouse [ 2 ] applies negation detection and indexes the produced subtexts which include affirmed findings. Our approach combines negation detection and the extraction of concepts. But the extraction does not take place during preprocessing, but at runtime. That provides an ad hoc, dynamic, interactive and adjustable information extraction of random concepts and even their values on the fly at runtime. Conclusions: We developed an ad hoc information extraction query feature for Boolean and numerical values within a CDW with high recall and precision based on a pipeline that detects and removes negations and their scope in clinical texts.

Clinical Research in Cardiology | 2016

Prevalence of severe mitral regurgitation eligible for edge-to-edge mitral valve repair (MitraClip).

Julia Wallenborn; Stefan Störk; Sebastian Herrmann; Olga Kukuy; Georg Fette; Frank Puppe; Armin Gorski; Kai Hu; Wolfram Voelker; Georg Ertl; Frank Weidemann

GI-Jahrestagung | 2012