Hagen Hirschmann
Humboldt University of Berlin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hagen Hirschmann.
Archive | 2007
Hagen Hirschmann; Seanna Doolittle; Anke Lüdeling
This paper deals with the syntactic annotation of corpora that contain both ‘canonical’ and ‘non-canonical’ sentences. Consider Examples (1) and (2) from the German learner corpus Falko which will be introduced below. (1) represents a syntactically correct (although perhaps not very enlightening) utterance to which it is easy to assign a syntactic structure. The utterance in (2), on the other hand, would be considered incorrect (and probably be interpreted as a word order error) – it is much more difficult to assign a syntactic structure to it. The question is: how can (1) and (2) be annotated in a uniform way that shows that there is a difference and makes clear exactly where that difference lies?
Archive | 2015
Anke Lüdeling; Hagen Hirschmann; Sylviane Granger; Gaëtanelle Gilquin; Fanny Meunier
and says only that this part of the learner utterance is unidiomatic, confl ating an implicit target hypothesis with an error tag (the annotator is only able to know that this expression is unidiomatic if he or she knows a more idiomatic expression). Different target hypotheses are not equivalent; a target hypothesis directly infl uences the following analysis. The Falko corpus consistently has two target hypotheses – the fi rst one deals with clear grammatical errors and the second one also corrects stylistic problems. The need for such an approach becomes clear in (11). The learner utterance in (11) contains a spelling error . The two occurrences of dependance have to be replaced by dependence . From a more abstract perspective, the whole phrase Dependence on gambling sounds unidiomatic if we take into account that the learner wants to refer to a specifi c kind of addiction. Similarly, dependence on drugs appears to be a marked expression as opposed to drug addiction . An annotation that wants to take this into consideration has to separate the description into the annotation of the spelling error and the annotation of the stylistic error in order not to lose one of the pieces of information. Example (12) illustrates this. The examples in this section show how important the step of formulating a target hypothesis is – the subsequent error classifi cation critically depends on this fi rst step. In order to operationalise the fi rst step of the error annotation , one can give guidelines for the formulation of target hypotheses, in addition to the guidelines for assigning error tags, which also need to be evaluated with regard to consistency (see Section 2.6 ). The problem of unclear error identifi cation has been discussed since the beginning of EA. Milton and Chowdhury ( 1994 ) have already suggested that sometimes multiple analyses should be coded in a learner corpus. If (11) Dependance on gambling is something like dependance on drugs (...) (ICLE-CZ-PRAG-0013.3) (12) LU Dependance on gambling TH 1 Dependence on gambling TH 2 Gambling addiction (10) LU it sleeps inside everyone from the start of being TH 1 it sleeps inside everyone since birth TH 2 it sleeps inside everyone from the beginning TH 3 it sleeps inside everyone UNIDIOMATIC 9781107041196c07_p135-158.indd 145 6/11/2015 1:48:09 PM LÜDELING AND HIRSCHMANN 146 the target hypothesis is left implicit or there is only one error analysis , the user is given an error annotation without knowing against which form the utterance was evaluated. In early corpora (pre-multi-layer, pre-XML) it was technically impossible to show the error exponent because errors could only be marked on one token. In corpora that use an XML format it is possible to mark spans, and target hypotheses are sometimes given in the XML mark-up. Only in standoff architectures, however, is it possible to give several competing target hypotheses. Examples of learner corpora with consistent and well-documented (multiple) target hypotheses are the Falko corpus, the trilingual MERLIN corpus (Wisniewski et al. 2013 ) or the Czech as a Second Language corpus (Rosen et al. 2014 ).
ACM Journal on Computing and Cultural Heritage | 2012
Hagen Hirschmann; Anke Lüdeling; Amir Zeldes
Our article explores the possibilities of using deeply annotated, incrementally evolving comparable corpora for the study of language change, in this case for different stages from Old High German to New High German. Using the example of the evolution of German past tenses, we show how a variety of categories ranging from low to high complexity interact with the choice between competing linguistic variants. To adequately explore the influence of these categories, we use a multilayer corpus architecture that develops together with our study. We show that a combination of quantitative and qualitative analyses can recognize relevant contextual factors, which feed into the addition of new annotation layers applying to the same data. By making our categorizations explicit as corpus annotations and our data available to other researchers, we promote an open, extensible, and transparent mode of research, where both raw data and the inferential process are exposed to other researchers.
Deutsch als Fremdsprache | 2008
Anke Lüdeling; Seanna Doolittle; Hagen Hirschmann; Karin Schmidt; Maik Walter
Archive | 2011
Marc Reznicek; Anke Lüdeling; Hagen Hirschmann
Journal of Second Language Writing | 2015
Nina Vyatkina; Hagen Hirschmann; Felix Golcher
Twenty years of learner corpus research: looking back, moving ahead, 2013, ISBN 978-2-87558-199-0, págs. 223-234 | 2013
Hagen Hirschmann; Anke Lüdeling; Ines Rehbein; Marc Reznicek; Amir Zeldes
Archive | 2010
Anke Lüdeling; Amir Zeldes; Marc Reznicek; Ines Rehbein; Hagen Hirschmann
Archive | 2013
Marc Reznicek; Anke Lüdeling; Hagen Hirschmann
Archive | 2011
Anke Lüdeling; Hagen Hirschmann; Amir Zeldes