Uwe Quasthoff | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Uwe Quasthoff is active.

Explore More

Publication

Featured researches published by Uwe Quasthoff.

conference on intelligent text processing and computational linguistics | 2004

Language-Independent Methods for Compiling Monolingual Lexical Data

Christian Biemann; Stefan Bordag; Gerhard Heyer; Uwe Quasthoff; Christian Wolff

In this paper we describe a flexible, portable and language-independent infrastructure for setting up large monolingual language corpora. The approach is based on collecting a large amount of monolingual text from various sources. The input data is processed on the basis of a sentence-based text segmentation algorithm. We describe the entry structure of the corpus database as well as various query types and tools for information extraction. Among them, the extraction and usage of sentence-based word collocations is discussed in detail. Finally we give an overview of different applications for this language resource. A WWW interface allows for public access to most of the data and information extraction tools (http://wortschatz.uni-leipzig.de).

Lecture Notes in Computer Science | 2002

Automatic Analysis of Large Text Corpora - A Contribution to Structuring WEB Communities

Gerhard Heyer; Uwe Quasthoff; Christian Wolff

This paper describes a corpus linguistic analysis of large text corpora based on collocations with the aim of extracting semantic relations from unstructured text. We regard this approach as a viable method for generating and structuring information about WEB communities. Starting from a short description of our corpora as well as our language analysis tools, we discuss in depth the automatic generation of collocation sets. We further give examples of different types of relations that may be found in collocation sets for arbitrary terms. We conclude with a brief discussion of applying our approach to the analysis of a sample community.

Lecture Notes in Computer Science | 2003

Small Worlds of Concepts and Other Principles of Semantic Search

Stefan Bordag; Gerhard Heyer; Uwe Quasthoff

A combination of the strengths of both classic information retrieval with the distributed approach of P2P networks can avoid both their weaknesses: The organisation of document collections relevant for special communities allows both high coverage and quick access. We present a theoretical framework in which the semantic structure between words can be deduced from a document collection. This structural knowledge can then be used to connect document collections to communities based on their content.

Lecture Notes in Computer Science | 2004

Calculating communities by link analysis of URLs

Gerhard Heyer; Uwe Quasthoff

Collocation analysis finds semantic associations of concepts using large text corpora. If the same procedure is applied to sets of outgoing links of web pages, we can find semantically related web domains to a large extent. The structure of the semantic clusters shows all properties of small worlds. The algorithm is known to work for large parts of the web like the German internet. As a sample application we present a surf guide for the German web.

international conference on computational linguistics | 2002

Named entity learning and verification: expectation maximization in large corpora

Uwe Quasthoff; Christian Biemann; Christian Wolff

The regularity of named entities is used to learn names and to extract named entities. Having only a few name elements and a set of patterns the algorithm learns new names and its elements. A verification step assures quality using a large background corpus. Further improvement is reached through classifying the newly learnt elements on character level. Moreover, unsupervised rule learning is discussed.

Archive | 2013

Statistical Corpus and Language Comparison on Comparable Corpora

Thomas Eckart; Uwe Quasthoff

With the wide availability of textual data in various languages, domains and registers it is easy to create text corpora for a variety of applications. These include, among many others, the field of Natural Language Processing. The Leipzig Corpora Collection creates and uses such corpora for more than fifteen years. However, the work on preprocessing distributed resources to ensure homogeneity and thus comparability is a steady process. As a result created corpora in identical formats allow the use of different statistical methods to generate various data for manual or automatic analysis. These are basis for applications in intra- and inter-language comparison or quality assurance of text stocks.

Informatik und Ausbildung, GI-Fachtagung 98, Informatik und Ausbildung | 1998

Praktikum Elektronisches Publizieren für Informatiker

Uwe Quasthoff; Christian Wolff

Die Gestaltung multimedialer Informationssysteme verlangt nach einer Querschnittqualifikation, die von den klassischen Studieninhalten der Informatik nur zum Teil vermittelt wird. Neben technische Aspekte wie Datenformate, Programmiersprachen fur Multimediaanwendungen und Algorithmen fur Bild- und Audioanwendungen treten Anforderungen wie Medien- und Interaktionsdesign (vgl. Degen 1996), Analyse und Planung komplexer multimedialer Strukturen und Projektmanagement (vgl. Burger 1996). Aufbauend auf einem grundstandigen Informatikstudium und eingebettet in den Studienschwerpunkt automatische Sprachverarbeitung wird am Institut fur Informatik der Universitat Leipzig seit 1996 regelmasig ein Praktikum Elektronisches Publizieren durchgefuhrt, dessen Ziel es ist, anhand eines umfangreichen aber uberschaubaren Projektes die Praxis der Multimediaproduktion zu vermitteln. Dabei geht es weniger um padagogisch motivierte Fragen des Instruktionsdesigns (vgl. Issing 1997) bzw. der Nutzung von multimedialen Systemen als vielmehr um das Erlernen des Umgangs mit Autorensystemen und die kooperative Durchfuhrung eines Publikationsprojektes. Dabei ist keine klare Abgrenzung zwischen Elektronischem Publizieren und Multimedia moglich (vgl. Nielsen 1996, Sandkuhl & Kindt 1996). Die Betonung von Fragen der Konvertierung und Uberarbeitung v.a. textuellen Materials fur das elektronische Medium gegenuber Fragen der Interaktivitat oder der Synchronisation verschiedener Medien legt aber die Bezeichnung elektronisches Publizieren nahe.

international conference on computational linguistics | 2014

Using Significant Word Co-occurences for the Lexical Access Problem

Rico Feist; Daniel Gerighausen; Manuel Konrad; Georg Richter; Thomas Eckart; Dirk Goldhahn; Uwe Quasthoff

One way to analyse word relations is to examine their co-occurrence in the same context. This allows for the identification of potential semantic or lexical relationships between words. As previous studies showed word co-occurrences often reflect human stimuli-response pairs. In this paper significant sentence co-occurrences on word level were used to identify potential responses for word stimuli based on three automatically generated text corpora of the Leipzig Corpora Collection.

Archive | 2014

Building Large Resources for Text Mining: The Leipzig Corpora Collection

Uwe Quasthoff; Dirk Goldhahn; Thomas Eckart

Many text mining algorithms and applications require the availability of large text corpora and certain statistics-based annotations. To ensure comparability of results a standardized corpus building process is required. Particularly noteworthy are all pre-processing procedures as they are crucial for the quality of the resulting data stock. This quality can be estimated by both evaluating the corpus building process and by statistical quality measurements on the corpus. Some of these approaches are described using the example of the Leipzig Corpora Collection.

acm conference on hypertext | 1991

MATHBANK: Mathematisches Fachwissen als Hypertext

Uwe Quasthoff

Beschrieben wird das Projekt MATHBANK. Ziel ist die Erstellung eines Hypertext-Systems, mit dem der Zugriff auf das Wissen aus einer groseren Handbibliothek mathematischer Fachbucher moglich ist. Diskutiert werden speziell die Nutzung der in der Mathematik vorhandenen Struktur zur Erstellung von Karten sowie die Moglichkeiten der automatisehen Sprachverarbeitung zur Analyse mathematischer Texte bei der Erzeugung von Verweisen.

Explore More