Johannes Dellert | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Johannes Dellert is active.

Explore More

Publication

Featured researches published by Johannes Dellert.

international conference on computational linguistics | 2008

TuLiPA: Towards a Multi-Formalism Parsing Environment for Grammar Engineering

Laura Kallmeyer; Timm Lichte; Wolfgang Maier; Yannick Parmentier; Johannes Dellert; Kilian Evang

In this paper, we present an open-source parsing environment (Tubingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars (TAG) and Multi-Component Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German.

Bulletin of The Polish Academy of Sciences-technical Sciences | 2010

TuLiPA - Parsing Extensions of TAG with Range Concatenation Grammars

Laura Kallmeyer; Wolfgang Maier; Yannick Parmentier; Johannes Dellert

In this paper we present a parsing framework for extensions of Tree Adjoining Grammars (TAG) called TuLiPA (Tuebingen Linguistic Parsing Architecture). In particular, besides TAG, the parser can process Tree-Tuple MCTAG with shared nodes (TT-MCTAG), a TAG-extension that has been proposed to deal with scrambling in free word order languages such as German. The central strategy of the parser is such that the incoming TT-MCTAG (or TAG) is transformed into an equivalent Range Concatenation Grammar (RCG) which, in turn, is then used for parsing. The RCG parser is an incremental Earley-style chart parser. In addition to the syntactic anlysis, TuLiPA computes also an underspecified semantic analysis for grammars that are equipped with semantic representations.

theory and applications of satisfiability testing | 2013

MUStICCa: MUS extraction with interactive choice of candidates

Johannes Dellert; Christian Zielke; Michael Kaufmann

Existing algorithms for minimal unsatisfiable subset (MUS) extraction are defined independently of any symbolic information, and in current implementations domain experts mostly do not have a chance to influence the extraction process based on their knowledge about the encoded problem. The MUStICCa tool introduces a novel graphical user interface for interactive deletion-based MUS finding, allowing the user to inspect and influence the structure of extracted MUSes. The tool is centered around an explicit visualization of the explored part of the search space, representing unsatisfiable subsets (USes) as selectable states. While inspecting the contents of any US, the user can select candidate clauses to initiate deletion attempts. The reduction steps can be enhanced by a range of state-of-the-art techniques such as clause-set refinement, model rotation, and autarky reduction. MUStICCa compactly represents the criticality information derived for the different USes in a shared data structure, which leads to significant savings in the number of solver calls when multiple MUSes are explored. For automatization, our tool includes a reduction agent mechanism into which arbitrary user-implemented deletion heuristics can be plugged.

Language Dynamics and Change | 2018

A new approach to concept basicness and stability as a window to the robustness of concept list rankings

Johannes Dellert; Armin Buch

Based on a recently published large-scale lexicostatistical database, we rank 1,016 concepts by their suitability for inclusion in Swadesh-style lists of basic stable concepts. For this, we define separate measures of basicness and stability. Basicness in the sense of morphological simplicity is measured based on information content, a generalization of word length which corrects for distorting effects of phoneme inventory sizes, phonotactics and non-stem morphemes in dictionary forms. Stability against replacement by semantic shift or borrowing is measured by sampling independent language pairs, and correlating the distances between the forms for the concept with the overall language distances. In order to determine the relative importance of basicness and stability, we optimize our combination of the two partial measures towards similarity with existing lists. A comparison with and among existing rankings suggests that concept rankings are highly data-dependent and therefore less well-grounded than previously assumed. To explore this issue, we evaluate the robustness of our ranking against language pair resampling, allowing us to assess how much volatility can be expected, and showing that only about half of the concepts on a list based on our ranking can safely be assumed to belong on the list independently of the data.

Septentrio Conference Series | 2015

Compiling the Uralic Dataset for NorthEuraLex, a Lexicostatistical Database of Northern Eurasia

Johannes Dellert

This paper presents a large comparative lexical database which covers about a thousand concepts across twenty Uralic languages. The dataset will be released as the first part of NorthEuraLex, a lexicostatistical database of Northern Eurasia which is being compiled within the EVOLAEMP project. The chief purpose of the lexical database is to serve as a basis of benchmarks for different tasks within computational historical linguistics, but it might also be valuable to researchers who work on the application of computational methods to open research questions within the language family. The paper describes and motivates the decisions taken concerning data collection methodology, also discussing some of the problems involved in compiling and unifying data from lexical resources in six different gloss languages. The dataset is already publicly available in various PDF formats for inspection and review, and is scheduled for release in machine-readable form in early 2015.

language resources and evaluation | 2008