Jonas Kuhn | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jonas Kuhn is active.

Explore More

Publication

Featured researches published by Jonas Kuhn.

meeting of the association for computational linguistics | 2000

Lexicalized stochastic modeling of constraint-based grammars using log-linear measures and EM training

Stefan Riezler; Jonas Kuhn; Detlef Prescher; Mark Johnson

We present a new approach to stochastic modeling of constraint-based grammars that is based on loglinear models and uses EM for estimation from unannotated data. The techniques are applied to an LFG grammar for German. Evaluation on an exact match task yields 86% precision for an ambiguity rate of 5.4, and 90% precision on a subcat frame match for an ambiguity rate of 25. Experimental comparison to training from a parsebank shows a 10% gain from EM training. Also, a new class-based grammar lexicalization is presented, showing a 10% gain over unlexicalized models.

meeting of the association for computational linguistics | 2004

Experiments in parallel-text based grammar induction

Jonas Kuhn

This paper discusses the use of statistical word alignment over multiple parallel texts for the identification of string spans that cannot be constituents in one of the languages. This information is exploited in monolingual PCFG grammar induction for that language, within an augmented version of the inside-outside algorithm. Besides the aligned corpus, no other resources are required. We discuss an implemented system and present experimental results with an evaluation against the Penn Tree-bank.

meeting of the association for computational linguistics | 2014

Learning Structured Perceptrons for Coreference Resolution with Latent Antecedents and Non-local Features

Anders Björkelund; Jonas Kuhn

We investigate different ways of learning structured perceptron models for coreference resolution when using non-local features and beam search. Our experimental results indicate that standard techniques such as early updates or Learning as Search Optimization (LaSO) perform worse than a greedy baseline that only uses local features. By modifying LaSO to delay updates until the end of each instance we obtain significant improvements over the baseline. Our model obtains the best results to date on recent shared task data for Arabic, Chinese, and English.

conference on computational natural language learning | 2009

Data-Driven Dependency Parsing of New Languages Using Incomplete and Noisy Training Data

Kathrin Spreyer; Jonas Kuhn

We present a simple but very effective approach to identifying high-quality data in noisy data sets for structured problems like parsing, by greedily exploiting partial structures. We analyze our approach in an annotation projection framework for dependency trees, and show how dependency parsers from two different paradigms (graph-based and transition-based) can be trained on the resulting tree fragments. We train parsers for Dutch to evaluate our method and to investigate to which degree graph-based and transition-based parsers can benefit from incomplete training data. We find that partial correspondence projection gives rise to parsers that out-perform parsers trained on aggressively filtered data sets, and achieve unlabeled attachment scores that are only 5% behind the average UAS for Dutch in the CoNLL-X Shared Task on supervised parsing (Buchholz and Marsi, 2006).

Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications | 2009

Exploiting Translational Correspondences for Pattern-Independent MWE Identification

Sina Zarriess; Jonas Kuhn

Based on a study of verb translations in the Europarl corpus, we argue that a wide range of MWE patterns can be identified in translations that exhibit a correspondence between a single lexical item in the source language and a group of lexical items in the target language. We show that these correspondences can be reliably detected on dependency-parsed, word-aligned sentences. We propose an extraction method that combines word alignment with syntactic filters and is independent of the structural pattern of the translation.

Archive | 2001

Formal and Computational Aspects of Optimality-theoretic Syntax

Jonas Kuhn

In this dissertation, I propose a formal framework for stating in detail a class of Optimality-Theoretic models for syntax. I discuss empirical consequences of the key choices in the formalization and investigate computational properties of the models, in particular decidability of the parsing and generation tasks. The candidate analyses I assume are non-derivational, represented as tuples of parallel representation structures whose elements stand in a correspondence relation in the style of Lexical-Functional Grammar (LFG). The assumption of this type of candidates is motivated by learnability considerations and has the advantage that one can exploit formal and computational results for LFG and related grammar formalisms. The formalization I discuss (in chapter 4) builds on Joan Bresnan’s original proposal of casting Optimality-Theoretic Syntax in an LFG setting (OT-LFG). The set of all possible candidates is specified by a formal LFG-style grammar; a particular candidate set is defined as those possible candidates whose functional (f-)structure is subsumed by an f-structure representing the input. OT constraints are specified as structural description schemata using the primitives of LFG. I discuss details of the status of candidates violating Faithfulness constraints, as they are required to derive expletive elements (like English do) and non-overt elements (like in pro-drop). I argue that in OT-LFG, Faithfulness violations can be modelled very naturally as a tension between a candidate’s f-structure and its categorial structure and lexical material. Thus the subsumption-based definition of candidate sets can be kept up without implying an overly restricted candidate generation function Gen; in this formal model all language differences can be viewed as an effect of constraint (re-)ranking. Besides the standard production-based (or expressive) optimization model, I discuss comprehension-based (or interpretive) optimization, in which the terminal string is fixed across the members of the candidate set (chapter 5). Formally, this is only a minor modification of the definition of the candidate set, but there are interesting conceptual and empirical issues concerning parallelism between the two “directions” of optimization, and in particular the combination of both in a bidirectional model. I present a bidirectional account of pro-drop in Italian, which derives a recoverability condition as an effect of the interaction of the two optimizations. Building on computational results for LFG generation, I discuss the processing tasks associated with the two types of uni-directional optimization models and with their combination in a bidirectional system (chapter 6). The two main issues in processing are the control of the infinite candidate set and directionality of processing. I show that generally, the conceptually and empirically well-motivated formalization that I argue for provides a sufficiently restricted v basis for a computational account. While parsing (and generation) with an unrestricted OT Syntax system is undecidable in the general case, decidability is guaranteed if either a recoverability condition based on a finite context representation is assumed, or a specific type of bidirectional model (with strong bidirectionality) is applied.

Computational Linguistics | 2013

Morphological and syntactic case in statistical dependency parsing

Wolfgang Seeker; Jonas Kuhn

Most morphologically rich languages with free word order use case systems to mark the grammatical function of nominal elements, especially for the core argument functions of a verb. The standard pipeline approach in syntactic dependency parsing assumes a complete disambiguation of morphological (case) information prior to automatic syntactic analysis. Parsing experiments on Czech, German, and Hungarian show that this approach is susceptible to propagating morphological annotation errors when parsing languages displaying syncretism in their morphological case paradigms. We develop a different architecture where we use case as a possibly underspecified filtering device restricting the options for syntactic analysis. Carefully designed morpho-syntactic constraints can delimit the search space of a statistical dependency parser and exclude solutions that would violate the restrictions overtly marked in the morphology of the words in a given sentence. The constrained system outperforms a state-of-the-art data-driven pipeline architecture, as we show experimentally, and, in addition, the parser output comes with guarantees about local and global morpho-syntactic wellformedness, which can be useful for downstream applications.

meeting of the association for computational linguistics | 2009

Improving data-driven dependency parsing using large-scale LFG grammars

Lilja Øvrelid; Jonas Kuhn; Kathrin Spreyer

This paper presents experiments which combine a grammar-driven and a data-driven parser. We show how the conversion of LFG output to dependency representation allows for a technique of parser stacking, whereby the output of the grammar-driven parser supplies features for a data-driven dependency parser. We evaluate on English and German and show significant improvements stemming from the proposed dependency structure as well as various other, deep linguistic features derived from the respective grammars.

meeting of the association for computational linguistics | 2000

Processing optimality-theoretic syntax by interleaved chart parsing and generation

Jonas Kuhn

The Earley deduction algorithm is extended for the processing of OT syntax based on feature grammars. Due to faithfulness violations, infinitely many candidates must be compared. With the (reasonable) assumptions (i) that OT constraints are descriptions denoting bounded structures and (ii) that every rule recursion in the base grammar incurs some constraint violation, a chart algorithm can be devised. Interleaving parsing and generation permits the application of generation-based optimization even in the parsing task, i.e., for a string input.

meeting of the association for computational linguistics | 2014

Visualization, Search, and Error Analysis for Coreference Annotations

Markus Gärtner; Anders Björkelund; Gregor Thiele; Wolfgang Seeker; Jonas Kuhn

We present the ICARUS Coreference Explorer, an interactive tool to browse and search coreference-annotated data. It can display coreference annotations as a tree, as an entity grid, or in a standard textbased display mode, and lets the user switch freely between the different modes. The tool can compare two different annotations on the same document, allowing system developers to evaluate errors in automatic system predictions. It features a flexible search engine, which enables the user to graphically construct search queries over sets of documents annotated with coreference.

Explore More