Tim vor der Brück
FernUniversität Hagen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tim vor der Brück.
text speech and dialogue | 2010
Sven Hartrumpf; Tim vor der Brück; Christian Eichhorn
Identifying duplicate texts is important in many areas like plagiarism detection, information retrieval, text summarization, and question answering. Current approaches are mostly surface-oriented (or use only shallow syntactic representations) and see each text only as a token list. In this work however, we describe a deep, semantically oriented method based on semantic networks which are derived by a syntactico-semantic parser. Semantically identical or similar semantic networks for each sentence of a given base text are efficiently retrieved by using a specialized index. In order to detect many kinds of paraphrases the semantic networks of a candidate text are varied by applying inferences: lexico-semantic relations, relation axioms, and meaning postulates. Important phenomena occurring in difficult duplicates are discussed. The deep approach profits from background knowledge, whose acquisition from corpora is explained briefly. The deep duplicate recognizer is combined with two shallow duplicate recognizers in order to guarantee a high recall for texts which are not fully parsable. The evaluation shows that the combined approach preserves recall and increases precision considerably in comparison to traditional shallow methods.
The Prague Bulletin of Mathematical Linguistics | 2016
Steffen Eger; Tim vor der Brück; Alexander Mehler
Abstract We consider the isolated spelling error correction problem as a specific subproblem of the more general string-to-string translation problem. In this context, we investigate four general string-to-string transformation models that have been suggested in recent years and apply them within the spelling error correction paradigm. In particular, we investigate how a simple ‘k-best decoding plus dictionary lookup’ strategy performs in this context and find that such an approach can significantly outdo baselines such as edit distance, weighted edit distance, and the noisy channel Brill and Moore model to spelling error correction. We also consider elementary combination techniques for our models such as language model weighted majority voting and center string combination. Finally, we consider real-world OCR post-correction for a dataset sampled from medieval Latin texts.
sighum workshop on language technology for cultural heritage social sciences and humanities | 2015
Tim vor der Brück; Steffen Eger; Alexander Mehler
We present a survey of tagging accuracies — concerning part-of-speech and full morphological tagging — for several taggers based on a corpus for medieval church Latin (see www.comphistsem.org). The best tagger in our sample, Lapos, has a PoS tagging accuracy of close to 96% and an overall tagging accuracy (including full morphological tagging) of about 85%. When we ‘intersect’ the taggers with our lexicon, the latter score increases to almost 91% for Lapos. A conservative assessment of lemmatization accuracy on our data estimates a score of 93-94% for a lexicon-based lemmatization strategy and a score of 94-95% for lemmatizing via trained lemmatizers.
Archive | 2014
Alexander Mehler; Tim vor der Brück; Rüdiger Gleim; T. Geelhaar
The analysis of longitudinal corpora of historical texts requires the integrated development of tools for automatically preprocessing these texts and for building representation models of their genre- and register-related dynamics. In this chapter we present such a joint endeavor that ranges from resource formation via preprocessing to network-based text representation and classification. We start with presenting the so-called TTLab Latin Tagger (TLT) that preprocesses texts of classical and medieval Latin. Its lexical resource in the form of the Frankfurt Latin Lexicon (FLL) is also briefly introduced. As a first test case for showing the expressiveness of these resources, we perform a tripartite classification task of authorship attribution, genre detection and a combination thereof. To this end, we introduce a novel text representation model that explores the core structure (the so-called coreness) of lexical network representations of texts. Our experiment shows the expressiveness of this representation format and mediately of our Latin preprocessor.
international conference on machine learning and applications | 2010
Tim vor der Brück; Hermann Helbig
There is a substantial body of work on the extraction of relations from texts, most of which is based on pattern matching or on applying tree kernel functions to syntactic structures. Whereas pattern application is usually more efficient, tree kernels can be superior when assessed by the F-measure. In this paper, we introduce a hybrid approach to extracting meronymy relations, which is based on both patterns and kernel functions. In a first step, meronymy relation hypotheses are extracted from a text corpus by applying patterns. In a second step these relation hypotheses are validated by using several shallow features and a graph kernel approach. In contrast to other meronymy extraction and validation methods which are based on surface or syntactic representations we use a purely semantic approach based on semantic networks. This involves analyzing each sentence of the Wikipedia corpus by a deep syntactico-semantic parser and converting it into a semantic network. Meronymy relation hypotheses are extracted from the semantic networks by means of an automated theorem prover, which employs a set of logical axioms and patterns in the form of semantic networks. The meronymy candidates are then validated by means of a graph kernel approach based on common walks. The evaluation shows that this method achieves considerably higher accuracy, recall, and F-measure than a method using purely shallow validation.
language and technology conference | 2009
Tim vor der Brück; Sven Hartrumpf
One major reason that readability checkers are still far away from judging the understandability of texts consists in the fact that no semantic information is used. Syntactic, lexical, or morphological information can only give limited access for estimating the cognitive difficulties for a human being to comprehend a text. In this paper however, we present a readability checker which uses semantic information in addition. This information is represented as semantic networks and is derived by a deep syntactico-semantic analysis. We investigate in which situations a semantic readability indicator can lead to superior results in comparison with ordinary surface indicators like sentence length. Finally, we compute the weights of our semantic indicators in the readability function based on the user ratings collected in an online evaluation.
international conference on human-computer interaction | 2014
Alexander Mehler; Tim vor der Brück; Andy Lücking
HCI systems are often equipped with gestural interfaces drawing on a predefined set of admitted gestures. We provide an assessment of the fitness of such gesture vocabularies in terms of their learnability and naturalness. This is done by example of rivaling gesture vocabularies of the museum information system WikiNect. In this way, we do not only provide a procedure for evaluating gesture vocabularies, but additionally contribute to design criteria to be followed by the gestures.
Zeitschrift Fur Sprachwissenschaft | 2007
Tim vor der Brück; Stephan Busemann
Abstract Tree mapping grammars are used in natural language generation (NLG) to map non-linguistic input onto a derivation tree from which the target text can be trivially read off as the terminal yield. Such grammars may consist of a large number of rules. Finding errors is quite tedious and sometimes very time-consuming. Often the generation fails because the relevant input subtree is not specified correctly. This work describes a method to detect and correct wrong assignments of input subtrees to grammar categories by cross-validating grammar rules with the given input structures. The method also detects and corrects the usage of a category in a grammar rule. The result is implemented in a grammar development workbench and accelerates the grammar writers work considerably. The paper suggests the algorithms can be ported to other areas in which tree mapping is required.
Lecture Notes in Computer Science | 2004
Andreas Eisele; Tim vor der Brück
Error-tolerant lookup of words in large vocabularies hasmany potential uses, both within and beyond natural language processing (NLP). This work describes a generic library for finite-state-based lexical lookup, originally designed for NLP-related applications, that can be adapted to application-specific error metrics. We show how this tool can be used for searching existing trademarks in a database, using orthographic and phonetic similarity. We sketch a prototypical implementation of a trademark search engine and show results of a preliminary evaluation of this system.
international conference on agents and artificial intelligence | 2017
Stefan Schnürle; Marc Pouly; Tim vor der Brück; Alexander A. Navarini; Thomas Koller
Hand eczema is one of the most frequent skin diseases affecting up to 14% of the population. Early detection and continuous observation of eczemas allows for efficient treatment and can therefore relieve symptoms. However, purely manual skin control is tedious and often error prone. Thus, an automatic approach that can assist the dermatologist with his work is desirable. Together with our industry partner swiss4ward, we devised an image processing method for hand eczema segmentation based on support vector machines and conducted several experiments with different feature sets. Our implementation is planned to be integrated into a clinical information system for operational use at the University Hospital Zurich. Instead of focusing on a high accuracy like most existing state-of-the-art approaches, we selected F1 score as our primary measure. This decision had several implications regarding the design of our segmentation method, since all popular implementations of support vector machines aim for optimizing accuracy. Finally, we evaluated our system and achieved an F1 score of 58.6% for front sides of hands and 43.8% for back sides, which outperforms several state-of-the-art methods that were tested on our gold standard data set as well.