Steffen Eger
Goethe University Frankfurt
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Steffen Eger.
American Mathematical Monthly | 2014
Steffen Eger
Abstract We derive asymptotic formulas for central extended binomial coefficients, which are generalizations of binomial coefficients, using the distribution of the sum of independent discrete uniform random variables with the Central Limit Theorem and a local limit variant.
Mathematical Social Sciences | 2016
Steffen Eger
We study a DeGroot-like opinion dynamics model in which agents may oppose other agents. As an underlying motivation, in our setup, agents want to adjust their opinions to match those of the agents of their ‘in-group’ and, in addition, they want to adjust their opinions to match the ‘inverse’ of those of the agents of their ‘out-group’. Our paradigm can account for persistent disagreement in connected societies as well as bi- and multi-polarization. Outcomes depend upon network structure and the choice of deviation function modeling the mode of opposition between agents. For a particular choice of deviation function, which we call soft opposition, we derive necessary and sufficient conditions for long-run polarization. We also consider social influence (who are the opinion leaders in the network?) as well as the question of wisdom in our naive learning paradigm, finding that wisdom is difficult to attain when there exist sufficiently strong negative relations between agents.11Earlier and more verbose working paper versions of this article can be found at http://arxiv.org/pdf/1306.3134 and the author’s personal website.
meeting of the association for computational linguistics | 2016
Steffen Eger; Alexander Mehler
We consider two graph models of semantic change. The first is a time-series model that relates embedding vectors from one time period to embedding vectors of previous time periods. In the second, we construct one graph for each word: nodes in this graph correspond to time points and edge weights to the similarity of the word’s meaning across two time points. We apply our two models to corpora across three different languages. We find that semantic change is linear in two senses. Firstly, today’s embedding vectors (= meaning) of words can be derived as linear combinations of embedding vectors of their neighbors in previous time periods. Secondly, self-similarity of words decays linearly in time. We consider both findings as new laws/hypotheses of semantic change.
The Prague Bulletin of Mathematical Linguistics | 2016
Steffen Eger; Tim vor der Brück; Alexander Mehler
Abstract We consider the isolated spelling error correction problem as a specific subproblem of the more general string-to-string translation problem. In this context, we investigate four general string-to-string transformation models that have been suggested in recent years and apply them within the spelling error correction paradigm. In particular, we investigate how a simple ‘k-best decoding plus dictionary lookup’ strategy performs in this context and find that such an approach can significantly outdo baselines such as edit distance, weighted edit distance, and the noisy channel Brill and Moore model to spelling error correction. We also consider elementary combination techniques for our models such as language model weighted majority voting and center string combination. Finally, we consider real-world OCR post-correction for a dataset sampled from medieval Latin texts.
sighum workshop on language technology for cultural heritage social sciences and humanities | 2015
Tim vor der Brück; Steffen Eger; Alexander Mehler
We present a survey of tagging accuracies — concerning part-of-speech and full morphological tagging — for several taggers based on a corpus for medieval church Latin (see www.comphistsem.org). The best tagger in our sample, Lapos, has a PoS tagging accuracy of close to 96% and an overall tagging accuracy (including full morphological tagging) of about 85%. When we ‘intersect’ the taggers with our lexicon, the latter score increases to almost 91% for Lapos. A conservative assessment of lemmatization accuracy on our data estimates a score of 93-94% for a lexicon-based lemmatization strategy and a score of 94-95% for lemmatizing via trained lemmatizers.
international joint conference on natural language processing | 2015
Steffen Eger
We investigate multiple many-to-many alignments as a primary step in integrating supplemental information strings in string transduction. Besides outlining DP based solutions to the multiple alignment problem, we detail an approximation of the problem in terms of multiple sequence segmentations satisfying a coupling constraint. We apply our approach to boosting baseline G2P systems using homogeneous as well as heterogeneous sources of supplemental information.
Journal of Quantitative Linguistics | 2013
Steffen Eger
Abstract We derive a stochastic word length distribution model based on the concept of compound distributions and show its relationships with and implications for Wimmer et al. ’s (1994) synergetic word length distribution model.
systems and frameworks for computational morphology | 2015
Steffen Eger
We consider the statistical lemmatization problem in which lemmatizers are trained on (word form, lemma) pairs. In particular, we consider this problem for ancient Latin, a language with high degree of morphological variability. We investigate whether general purpose string-to-string transduction models are suitable for this task, and find that they typically perform (much) better than more restricted lemmatization techniques/heuristics based on suffix transformations. We also experimentally test whether string transduction systems that perform well on one string-to-string translation task (here, G2P) perform well on another (here, lemmatization) and vice versa, and find that a joint n-gram modeling performs better on G2P than a discriminative model of our own making but that this relationship is reversed for lemmatization. Finally, we investigate how the learned lemmatizers can complement lexicon-based systems, e.g., by tackling the OOV and/or the disambiguation problem.
joint conference on lexical and computational semantics | 2015
Steffen Eger; Niko Schenk; Alexander Mehler
We induce semantic association networks from translation relations in parallel corpora. The resulting semantic spaces are encoded in a single reference language, which ensures cross-language comparability. As our main contribution, we cluster the obtained (crosslingually comparable) lexical semantic spaces. We find that, in our sample of languages, lexical semantic spaces largely coincide with genealogical relations. To our knowledge, this constitutes the first large-scale quantitative lexical semantic typology that is completely unsupervised, bottom-up, and datadriven. Our results may be important for the decision which multilingual resources to integrate in a semantic evaluation task.
empirical methods in natural language processing | 2015
Steffen Eger
We investigate the need for bigram alignment models and the benefit of supervised alignment techniques in graphemeto-phoneme (G2P) conversion. Moreover, we quantitatively estimate the relationship between alignment quality and overall G2P system performance. We find that, in English, bigram alignment models do perform better than unigram alignment models on the G2P task. Moreover, we find that supervised alignment techniques may perform considerably better than their unsupervised brethren and that few manually aligned training pairs suffice for them to do so. Finally, we estimate a highly significant impact of alignment quality on overall G2P transcription performance and that this relationship is linear in nature.