Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Harald Lüngen is active.

Publication


Featured researches published by Harald Lüngen.


Literary and Linguistic Computing | 2005

Unification of XML Documents with Concurrent Markup

Andreas Witt; Daniela Goecke; Felix Sasaki; Harald Lüngen

An approach to the unification of XML (Extensible Markup Language) documents with identical textual content and conc irrent markup in the framework of XML-based multi-layer annotation is introduced. A Prolog program allows the possible relationships between element instances on two annotation layers that share PCDATA to be explored and also the computing of a target node hierarchy for a well-formed, merged XML document. Special attention is paid to identity conflicts between element instances, for which a default solution that takes into account metarelations that hold between element types on the different annotation layers is provided. In addition, rules can be specified by a user to prescribe how identity conflicts should be solved for certain element types.


meeting of the association for computational linguistics | 2004

Text type structure and logical document structure

Hagen Langer; Harald Lüngen; Petra Saskia Bayerl

Most research on automated categorization of documents has concentrated on the assignment of one or many categories to a whole text. However, new applications, e.g. in the area of the Semantic Web, require a richer and more fine-grained annotation of documents, such as detailed thematic information about the parts of a document. Hence we investigate the automatic categorization of text segments of scientific articles with XML markup into 16 topic types from a text type structure schema. A corpus of 47 linguistic articles was provided with XML markup on different annotation layers representing text type structure, logical document structure, and grammatical categories. Six different feature extraction strategies were applied to this corpus and combined in various parametrizations in different classifiers. The aim was to explore the contribution of each type of information, in particular the logical structure features, to the classification accuracy. The results suggest that some of the topic types of our hierarchy are successfully learnable, while the features from the logical structure layer had no particular impact on the results.


international conference natural language processing | 2006

Discourse segmentation of german written texts

Harald Lüngen; Csilla Puskás; Maja Bärenfänger; Mirco Hilbert; Henning Lobin

Discourse segmentation is the division of a text into minimal discourse segments, which form the leaves in the trees that are used to represent discourse structures. A definition of elementary discourse segments in German is provided by adapting widely used segmentation principles for English minimal units, while considering punctuation, morphology, sytax, and aspects of the logical document structure of a complex text type, namely scientific articles. The algorithm and implementation of a discourse segmenter based on these principles is presented, as well an evaluation of test runs.


Archive | 2000

Speech Lexica and Consistent Multilingual Vocabularies

Dafydd Gibbon; Harald Lüngen

This contribution describes the theoretical foundations and lexical engineering procedures used in developing a common, consistent, linguistically and formally well-defined lexical database for all components of the Verbmobil speech-to-speech translation system.


Zeitschrift Fur Sprachwissenschaft | 2007

Repräsentation und Verknüpfung allgemeinsprachlicher und terminologischer Wortnetze in OWL

Claudia Kunze; Lothar Lemnitzer; Harald Lüngen; Angelika Storrer

Abstract This paper describes an approach to modelling a general-language wordnet, GermaNet, and a domain-specific wordnet, TermNet, in the web ontology language OWL. While the modelling process for GermaNet adopts relevant recommendations with respect to the English Princeton WordNet, for TermNet an alternative modelling concept is developed that considers the special characteristics of domain-specific terminologies. We present a proposal for linking a general-language wordnet and a terminological wordnet within the framework of OWL and on this basis discuss problems and alternative modelling approaches.


Natural Language Processing and Speech Technology, Results of the 3rd KONVENS Conference | 1996

The Treatment of Compounds in a Morphological Component for Speech Recognition

Frederek Althoff; Guido Drexel; Harald Lüngen; Martina Pampel; Christoph Schillo

This paper describes a morphological component in a speech recog nition system for German dealing with the construction of complex word form hypotheses out of a lattice of simplex forms Our example is the recognition of compounds from their individual components Evaluation results are presented for speech recognition with and without morphologically based word recognition Dieser Aufsatz beschreibt eine Morphologiekomponente in einem Spracherkennungssystem f ur das Deutsche welche die Konstruk tion von komplexen Worthypothesen aus einem W ortergitter von Simplizia am Beispiel der Erkennung von Komposita aus ihren Einzelbe standteilen behandelt Evaluationsergebnisse f ur morphologisch und nicht morphologisch basierte Worterkennung werden vorgestellt Goals and motivation This paper proposes a strategy for partially satisfying the growing demands on speech recognition systems e g large vocabulary recognition few domain restric tions robustness and unknown word recognition by integrating morphological knowledge into the speech recognition process Current stochastic word recog nizers have for example certain di culties with compound word forms Com pounds can be de ned as words which are built compositionally from other words or stems of words that can occur as free forms Examples of German compounds are Arzttermin constituents Arzt Termin Arbeitsamt constituents Arbeit Amt Wochenendtermin constituents Woche Ende Termin Compounding is a frequent phenomenon in spontaneous speech In the current VERBMOBIL transliteration corpus of wordform tokens and the related lexical database of wordform types the token frequency of compounds is the type fre quency amounts to Both compounds and their individual constituents were included in the recog nition dictionary and most of the compounds as well as their individual con stituents but in almost all their possible in ected forms occurred in the output lattice of the stochastic word recognition system cf H ubener et al A dictionary of this kind is highly redundant large dictionaries reduce the speed of the stochastic word recognition and in view of the in nite number of potential out of vocabulary compounds an exhaustive lexical listing is simply not feasible For the task of recognizing out of vocabulary words the employment of phonotactic constraints on well formed syllable structures has already been tested see e g Jusek et al Since complex words consist of units which are members of a nite set of morphs it is also possible to specify morphotactic rules which operate on this nite morph lexicon to derive complex word forms It is obvious that the set of actual morphs those which are lexicalized in a morph lexicon is only a subset of the set of potential morphs those which satisfy the phonotactic constraints Thus an integration of morphological knowledge leads to more speci c constraints on out of vocabulary complex word forms Occurrences of discontinuous split word forms are a further problem in recognizing spontaneous speech These often cannot be detected by speech recog nition systems because their phonological material is torn apart by slips of the tongue repetitions pauses or other insertions An analysis of split word forms in our corpus demonstrated that most are compounds split at morphological boundaries Although split compounds are not easily recognized by stochastic This paper was originally published in Dafydd Gibbon ed Natural Language Processing and Speech Technology Results of the rd KONVENS Conference Bielefeld October pp Berlin etc Mouton de Gruyter The Treatment of Compounds in a Morphological Component for Speech Recognition


Archive | 2010

Discourse Relations and Document Structure

Harald Lüngen; Maja Bärenfänger; Mirco Hilbert; Henning Lobin; Csilla Puskás

This chapter addresses the requirements and linguistic foundations of automatic relational discourse analysis of complex text types such as scientific journal articles. It is argued that besides lexical and grammatical discourse markers, which have traditionally been employed in discourse parsing, cues derived from the logical and generical document structure and the thematic structure of a text must be taken into account. An approach to modelling such types of linguistic information in terms of XML-based multi-layer annotations and to a text-technological representation of additional knowledge sources is presented. By means of quantitative and qualitative corpus analyses, cues and constraints for automatic discourse analysis can be derived. Furthermore, the proposed representations are used as the input sources for discourse parsing. A short overview of the projected parsing architecture is given.


Modeling, Learning, and Processing of Text Technological Data Structures | 2011

Processing Text-Technological Resources in Discourse Parsing

Henning Lobin; Harald Lüngen; Mirco Hilbert; Maja Bärenfänger

Discourse parsing of complex text types such as scientific research articles requires the analysis of an input document on linguistic and structural levels that go beyond traditionally employed lexical discourse markers. This chapter describes a text-technological approach to discourse parsing. Discourse parsing with the aim of providing a discourse structure is seen as the addition of a new annotation layer for input documents marked up on several linguistic annotation levels. The discourse parser generates discourse structures according to the Rhetorical Structure Theory. An overview of the knowledge sources and components for parsing scientific journal articles is given. The parser’s core consists of cascaded applications of the GAP, a Generic Annotation Parser. Details of the chart parsing algorithm are provided, as well as a short evaluation in terms of comparisons with reference annotations from our corpus and with recently developed systems with a similar task.


Proceedings of the workshop on challenges in the management of large corpora and big data and natural language processing (CMLC-5+BigNLP) 2017 including the papers from the web-as-corpus (WAC-XI) guest section. Birmingham, 24 july 2017. Edited by: Bański, Piotr <https://www.zora.uzh.ch/view/authors_for_linking_in_citation/Ba==0144ski=3APiotr=3A=3A.html>; Kupietz, Marc <https://www.zora.uzh.ch/view/authors_for_linking_in_citation/Kupietz=3AMarc=3A=3A.html>; Lüngen, Harald <https://www.zora.uzh.ch/view/authors_for_linking_in_citation/L=FCngen=3AHarald=3A=3A.html>; Rayson, Paul <https://www.zora.uzh.ch/view/authors_for_linking_in_citation/Rayson=3APaul=3A=3A.html>; Biber, Hanno <https://www.zora.uzh.ch/view/authors_for_linking_in_citation/Biber=3AHanno=3A=3A.html>; Breiteneder, Evelyn <https://www.zora.uzh.ch/view/authors_for_linking_in_citation/Breiteneder=3AEvelyn=3A=3A.html>; Clematide, Simon <https://www.zora.uzh.ch/view/authors_for_linking_in_citation/Clematide=3ASimon=3A=3A.html>; Mariani, John <https://www.zora.uzh.ch/view/authors_for_linking_in_citation/Mariani=3AJohn=3A=3A.html>; Stevenson, Mark <https://www.zora.uzh.ch/view/authors_for_linking_in_citation/Stevenson=3AMark=3A=3A.html>; Sick, Theresa <https://www.zora.uzh.ch/view/authors_for_linking_in_citation/Sick=3ATheresa=3A=3A.html> (2017). Mannheim: Institut für Deutsche Sprache. | 2017

Proceedings of the workshop on challenges in the management of large corpora and big data and natural language processing (CMLC-5+BigNLP) 2017 including the papers from the web-as-corpus (WAC-XI) guest section. Birmingham, 24 july 2017

Piotr Bański; Marc Kupietz; Harald Lüngen; Paul Rayson; Hanno Biber; Evelyn Breiteneder; Simon Clematide; John A. Mariani; Mark Stevenson; Theresa Sick

Many (modernist) works of literature can be understood by their associativeness, be it constructed or “free”. This network-like character of (modernist) literature has often been addressed by terms like “free association”, connotation”, “context” or “intertext”. This paper proposes an experimental and exemplary approach to intraconnect a literary corpus of the Austrian writer Ilse Aichinger with semantic webtechnologies to enable interactive explorations of word-associations.


Modeling, Learning, and Processing of Text Technological Data Structures | 2011

Introduction: Modeling, Learning and Processing of Text-Technological Data Structures

Alexander Mehler; Kai-Uwe Kühnberger; Henning Lobin; Harald Lüngen; Angelika Storrer; Andreas Witt

Researchers in many disciplines, sometimes working in close cooperation, have been concerned with modeling textual data in order to account for texts as the prime information unit of written communication. The list of disciplines includes computer science and linguistics as well as more specialized disciplines like computational linguistics and text technology. What many of these efforts have in common is the aim to model textual data by means of abstract data types or data structures that support at least the semi-automatic processing of texts in any area of written communication.

Collaboration


Dive into the Harald Lüngen's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Michael Beißwenger

Technical University of Dortmund

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge