Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stefan Bordag is active.

Publication


Featured researches published by Stefan Bordag.


conference on intelligent text processing and computational linguistics | 2004

Language-Independent Methods for Compiling Monolingual Lexical Data

Christian Biemann; Stefan Bordag; Gerhard Heyer; Uwe Quasthoff; Christian Wolff

In this paper we describe a flexible, portable and language-independent infrastructure for setting up large monolingual language corpora. The approach is based on collecting a large amount of monolingual text from various sources. The input data is processed on the basis of a sentence-based text segmentation algorithm. We describe the entry structure of the corpus database as well as various query types and tools for information extraction. Among them, the extraction and usage of sentence-based word collocations is discussed in detail. Finally we give an overview of different applications for this language resource. A WWW interface allows for public access to most of the data and information extraction tools (http://wortschatz.uni-leipzig.de).


international conference on computational linguistics | 2008

A comparison of co-occurrence and similarity measures as simulations of context

Stefan Bordag

Observations of word co-occurrences and similarity computations are often used as a straightforward way to represent the global contexts of words and achieve a simulation of semantic word similarity for applications such as word or document clustering and collocation extraction. Despite the simplicity of the underlying model, it is necessary to select a proper significance, a similarity measure and a similarity computation algorithm. However, it is often unclear how the measures are related and additionally often dimensionality reduction is applied to enable the efficient computation of the word similarity. This work presents a linear time complexity approximative algorithm for computing word similarity without any dimensionality reduction. It then introduces a large-scale evaluation based on two languages and two knowledge sources and discusses the underlying reasons for the relative performance of each measure.


cross-language evaluation forum | 2008

Unsupervised and Knowledge-Free Morpheme Segmentation and Analysis

Stefan Bordag

This paper presents a revised version of an unsupervised and knowledge-free morpheme boundary detection algorithm based on letter successor variety (LSV) and a trie classifier [1]. Additional knowledge about relatedness of the found morphs is obtained from a morphemic analysis based on contextual similarity. For the boundary detection the challenge of increasing recall of found morphs while retaining a high precision is tackled by adding a compound splitter, iterating the LSV analysis and dividing the trie classifier into two distinctly applied clasifiers. The result is a significantly improved overall performance and a decreased reliance on corpus size. Further possible improvements and analyses are discussed.


international conference on computational linguistics | 2003

Sentence co-occurrences as small-world graphs: a solution to automatic lexical disambiguation

Stefan Bordag

This paper presents a graph-theoretical approach to lexical disambiguation on word co-occurrences. Producing a dictionary similar to WordNet, this method is the counterpart to word sense disambiguation and thus makes one more step towards completely unsupervised natural language processing algorithms as well as generally better understanding of how to make computers meaningfully process natural language data.


Lecture Notes in Computer Science | 2003

Small Worlds of Concepts and Other Principles of Semantic Search

Stefan Bordag; Gerhard Heyer; Uwe Quasthoff

A combination of the strengths of both classic information retrieval with the distributed approach of P2P networks can avoid both their weaknesses: The organisation of document collections relevant for special communities allows both high coverage and quick access. We present a theoretical framework in which the semantic structure between words can be deduced from a document collection. This structural knowledge can then be used to connect document collections to communities based on their content.


Aspects of Automatic Text Analysis | 2007

A Structuralist Framework for Quantitative Linguistics

Stefan Bordag; Gerhard Heyer

Summary. Recent advances in the quantitative analysis of natural language call for a theoretical framework that explains, how these advances are possible. This helps to unify different approaches and algorithms in quantitative linguistics. We consider the linguistic tradition of structuralism as a basis for such a framework. In what follows, we focus on syntagmatic and paradigmatic relations and attempt to describe them in a coherent way. We present an abstract version of a (neo-)structuralist language model and show how already known algorithms fit into it. We also show how new algorithms can be derived from it. As has already been predicted by linguists like Firth and Harris, it is possible to construct a computational model of language based on linguistic structuralism and statistical mathematics. The model we propose specifically helps to explain fully unsupervised algorithms for natural language processing which are based on well known methods like co-occurrence measures and clustering.


text speech and dialogue | 2003

Advances in Automatic Speech Recognition by Imitating Spreading Activation

Stefan Bordag; Denisa Bordag

Inspired by recent insights into the properties of statistical word co-occurrences, we propose a mechanism which imitates spreading activation in the human mind in order to improve the identification of words during the automatic speech recognition process. This mechanism is able to make accurate semantic predictions about the currently uttered word as well as about words which are likely to come in the rest of a sentence. A robust automatic disambiguation algorithm provides a framework for semantic clustering, which allows to avoid the inherent polysemy problem.


annual meeting of the special interest group on discourse and dialogue | 2015

ExB Text Summarizer

Stefan Thomas; Christian Beutenmüller; Xose de la Puente; Robert Remus; Stefan Bordag

We present our state of the art multilingual text summarizer capable of single as well as multi-document text summarization. The algorithm is based on repeated application of TextRank on a sentence similarity graph, a bag of words model for sentence similarity and a number of linguistic pre- and post-processing steps using standard NLP tools. We submitted this algorithm for two different tasks of the MultiLing 2015 summarization challenge: Multilingual Singledocument Summarization and Multilingual Multi-document Summarization.


Archive | 2014

A Structuralist Approach for Personal Knowledge Exploration Systems on Mobile Devices

Stefan Bordag; Christian Hänig; Christian Beutenmüller

We describe the reasons and choices we made when designing an architecture for a multilingual Natural Language Processing (NLP) system for mobile devices. The most tangible limitations and problems are limited processing power of mobile devices, strong influence of idiolect (or generally personal language usage differentiation between individual users in their personal communication), effort required to port the NLP system to multiple languages, and finally the additional processing layers required when dealing with real-world data as opposed to controlled academic set-ups. Our solution is based on a strict differentiation between server-side preprocessing and client-side processing, as well as maximized usage of unsupervised techniques to avoid the problems posed by personal language usage variations. Hence it represents an adequate combination of solutions to provide robust NLP despite all these limitations.


conference of the european chapter of the association for computational linguistics | 2006

Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation

Stefan Bordag

Collaboration


Dive into the Stefan Bordag's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge