Sara Stymne | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sara Stymne is active.

Explore More

Publication

Featured researches published by Sara Stymne.

international conference natural language processing | 2008

German Compounds in Factored Statistical Machine Translation

Sara Stymne

An empirical method for splitting German compounds is explored by varying it in a number of ways to investigate the consequences for factored statistical machine translation between English and German in both directions. Compound splitting is incorporated into translation in a preprocessing step, performed on training data and on German translation input. For translation into German, compounds are merged based on part-of-speech in a postprocessing step. Compound parts are marked, to separate them from ordinary words. Translation quality is improved in both translation directions and the number of untranslated words in the English output is reduced. Different versions of the splitting algorithm performs best in the two different translation directions.

conference of the european chapter of the association for computational linguistics | 2009

A Comparison of Merging Strategies for Translation of German Compounds

Sara Stymne

In this article, compound processing for translation into German in a factored statistical MT system is investigated. Compounds are handled by splitting them prior to training, and merging the parts after translation. I have explored eight merging strategies using different combinations of external knowledge sources, such as word lists, and internal sources that are carried through the translation process, such as symbols or parts-of-speech. I show that for merging to be successful, some internal knowledge source is needed. I also show that an extra sequence model for part-of-speech is useful in order to improve the order of compound parts in the output. The best merging results are achieved by a matching scheme for part-of-speech tags.

empirical methods in natural language processing | 2015

Pronoun-Focused MT and Cross-Lingual Pronoun Prediction: Findings of the 2015 DiscoMT Shared Task on Pronoun Translation

Christian Hardmeier; Preslav Nakov; Sara Stymne; Jörg Tiedemann; Yannick Versley; Mauro Cettolo

We describe the design, the evaluation setup, and the results of the DiscoMT 2015 shared task, which included two subtasks, relevant to both the machine translation (MT) and the discourse communities: (i) pronoun-focused translation, a practical MT task, and (ii) cross-lingual pronoun prediction, a classification task that requires no specific MT expertise and is interesting as a machine learning task in its own right. We focused on the English‐French language pair, for which MT output is generally of high quality, but has visible issues with pronoun translation due to differences in the pronoun systems of the two languages. Six groups participated in the pronoun-focused translation task and eight groups in the cross-lingual pronoun prediction task.

workshop on statistical machine translation | 2008

Effects of Morphological Analysis in Translation between German and English

Sara Stymne; Maria Holmqvist; Lars Ahrenberg

We describe the LIU systems for German-English and English-German translation submitted to the Shared Task of the Third Workshop of Statistical Machine Translation. The main features of the systems, as compared with the baseline, is the use of morphological pre- and post-processing, and a sequence model for German using morphologically rich parts-of-speech. It is shown that these additions lead to improved translations.

workshop on statistical machine translation | 2009

Improving Alignment for SMT by Reordering and Augmenting the Training Corpus

Maria Holmqvist; Sara Stymne; Jody Foo; Lars Ahrenberg

We describe the LIU systems for English-German and German-English translation in the WMT09 shared task. We focus on two methods to improve the word alignment: (i) by applying Giza++ in a second phase to a reordered training corpus, where reordering is based on the alignments from the first phase, and (ii) by adding lexical data obtained as high-precision alignments from a different word aligner. These methods were studied in the context of a system that uses compound processing, a morphological sequence model for German, and a part-of-speech sequence model for English. Both methods gave some improvements to translation quality as measured by Bleu and Meteor scores, though not consistently. All systems used both out-of-domain and in-domain data as the mixed corpus had better scores in the baseline configuration.

workshop on statistical machine translation | 2007

Getting to Know Moses: Initial Experiments on German-English Factored Translation

Maria Holmqvist; Sara Stymne; Lars Ahrenberg

We present results and experiences from our experiments with phrase-based statistical machine translation using Moses. The paper is based on the idea of using an off-the-shelf parser to supply linguistic information to a factored translation model and compare the results of German---English translation to the shared task baseline system based on word form. We report partial results for this model and results for two simplified setups. Our best setup takes advantage of the parsers lemmatization and decompounding. A qualitative analysis of compound translation shows that decompounding improves translation quality.

Computational Linguistics | 2013

Generation of compound words in statistical machine translation into compounding languages

Sara Stymne; Nicola Cancedda; Lars Ahrenberg

In this article we investigate statistical machine translation (SMT) into Germanic languages, with a focus on compound processing. Our main goal is to enable the generation of novel compounds that have not been seen in the training data. We adopt a split-merge strategy, where compounds are split before training the SMT system, and merged after the translation step. This approach reduces sparsity in the training data, but runs the risk of placing translations of compound parts in non-consecutive positions. It also requires a postprocessing step of compound merging, where compounds are reconstructed in the translation output. We present a method for increasing the chances that components that should be merged are translated into contiguous positions and in the right order and show that it can lead to improvements both by direct inspection and in terms of standard translation evaluation metrics. We also propose several new methods for compound merging, based on heuristics and machine learning, which outperform previously suggested algorithms. These methods can produce novel compounds and a translation with at least the same overall quality as the baseline. For all subtasks we show that it is useful to include part-of-speech based information in the translation process, in order to handle compounds.

Proceedings of the First Conference on Machine Translation: Volume 2,#N# Shared Task Papers | 2016

Findings of the 2016 WMT Shared Task on Cross-lingual Pronoun Prediction

Liane Guillou; Christian Hardmeier; Preslav Nakov; Sara Stymne; Jörg Tiedemann; Yannick Versley; Mauro Cettolo; Bonnie Webber

We describe the design, the evaluation setup, and the results of the 2016 WMT shared task on cross-lingual pronoun prediction. This is a classification task in which participants are asked to provi ...

workshop on statistical machine translation | 2014

Anaphora Models and Reordering for Phrase-Based SMT

Christian Hardmeier; Sara Stymne; Jörg Tiedemann; Aaron Smith; Joakim Nivre

We describe the Uppsala University systems for WMT14. We look at the integration of a model for translating pronominal anaphora and a syntactic dependency projection model for English‐French. Furthermore, we investigate post-ordering and tunable POS distortion models for English‐ German.

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies : August 3-4, 2017 Vancouver, Canada, 2017, ISBN 978-1-945626-70-8, págs. 207-217 | 2017

From Raw Text to Universal Dependencies - Look, No Tags!

Miryam de Lhoneux; Yan Shao; Ali Basirat; Eliyahu Kiperwasser; Sara Stymne; Yoav Goldberg; Joakim Nivre

We present the Uppsala submission to the CoNLL 2017 shared task on parsing from raw text to universal dependencies. Our system is a simple pipeline consisting of two components. The first performs ...

Explore More