Ondrej Bojar
Charles University in Prague
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ondrej Bojar.
workshop on statistical machine translation | 2014
Ondrej Bojar; Christian Buck; Christian Federmann; Barry Haddow; Philipp Koehn; Johannes Leveling; Christof Monz; Pavel Pecina; Matt Post; Herve Saint-Amand; Radu Soricut; Lucia Specia; Aleš Tamchyna
This paper presents the results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translation task, a task for run-time estimation of machine translation quality, and a metrics task. This year, 143 machine translation systems from 23 institutions were submitted to the ten translation directions in the standard translation task. An additional 6 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had four subtasks, with a total of 10 teams, submitting 57 entries
workshop on statistical machine translation | 2015
Ondrej Bojar; Rajen Chatterjee; Christian Federmann; Barry Haddow; Matthias Huck; Chris Hokamp; Philipp Koehn; Varvara Logacheva; Christof Monz; Matteo Negri; Matt Post; Carolina Scarton; Lucia Specia; Marco Turchi
This paper presents the results of the WMT15 shared tasks, which included a standard news translation task, a metrics task, a tuning task, a task for run-time estimation of machine translation quality, and an automatic post-editing task. This year, 68 machine translation systems from 24 institutions were submitted to the ten translation directions in the standard translation task. An additional 7 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had three subtasks, with a total of 10 teams, submitting 34 entries. The pilot automatic postediting task had a total of 4 teams, submitting 7 entries.
arXiv: Computation and Language | 2016
Jindrich Libovický; Jindrich Helcl; Marek Tlustý; Ondrej Bojar; Pavel Pecina
Neural sequence to sequence learning recently became a very promising paradigm in machine translation, achieving competitive results with statistical phrase-based systems. In this system description paper, we attempt to utilize several recently published methods used for neural sequential learning in order to build systems for WMT 2016 shared tasks of Automatic Post-Editing and Multimodal Machine Translation.
Proceedings of the First Conference on Machine Translation: Volume 2,#N# Shared Task Papers | 2016
Jan-Thorsten Peter; Tamer Alkhouli; Hermann Ney; Matthias Huck; Fabienne Braune; Alexander M. Fraser; Aleš Tamchyna; Ondrej Bojar; Barry Haddow; Rico Sennrich; Frédéric Blain; Lucia Specia; Jan Niehues; Alex Waibel; Alexandre Allauzen; Lauriane Aufrant; Franck Burlot; Elena Knyazeva; Thomas Lavergne; François Yvon; Marcis Pinnis; Stella Frank
This paper describes the joint submission of the QT21 and HimL projects for the English→Romanian translation task of the ACL 2016 First Conference on Machine Translation (WMT 2016). The submission is a system combination which combines twelve different statistical machine translation systems provided by the different groups (RWTH Aachen University, LMU Munich, Charles University in Prague, University of Edinburgh, University of Sheffield, Karlsruhe Institute of Technology, LIMSI, University of Amsterdam, Tilde). The systems are combined using RWTH’s system combination approach. The final submission shows an improvement of 1.0 BLEU compared to the best single system on newstest2016.
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016
Philip Williams; Rico Sennrich; Maria Nadejde; Matthias Huck; Barry Haddow; Ondrej Bojar
This paper describes the University of Edinburgh’s phrase-based and syntax-based submissions to the shared translation tasks of the ACL 2016 First Conference on Machine Translation (WMT16). We submitted five phrase-based and five syntaxbased systems for the news task, plus one phrase-based system for the biomedical task.
empirical methods in natural language processing | 2016
Alexandra Birch; Omri Abend; Ondrej Bojar; Barry Haddow
Human evaluation of machine translation normally uses sentence-level measures such as relative ranking or adequacy scales. However, these provide no insight into possible errors, and do not scale well with sentence length. We argue for a semantics-based evaluation, which captures what meaning components are retained in the MT output, thus providing a more fine-grained analysis of translation quality, and enabling the construction and tuning of semantics-based MT. We present a novel human semantic evaluation measure, Human UCCA-based MT Evaluation (HUME), building on the UCCA semantic representation scheme. HUME covers a wider range of semantic phenomena than previous methods and does not rely on semantic annotation of the potentially garbled MT output. We experiment with four language pairs, demonstrating HUMEs broad applicability, and report good inter-annotator agreement rates and correlation with human adequacy scores.
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016
Rudolf Rosa; Roman Sudarikov; Michal Novák; Martin Popel; Ondrej Bojar
We describe our submission to the ITdomain translation task of WMT 2016. We perform domain adaptation with dictionary data on already trained MT systems with no further retraining. We apply our approach to two conceptually different systems developed within the QTLeap project: TectoMT and Moses, as well as Chimera, their combination. In all settings, our method improves the translation quality. Moreover, the basic variant of our approach is applicable to any MT system, including a black-box one.
workshop on hybrid approaches to translation | 2015
Aleš Tamchyna; Ondrej Bojar
We present a thorough analysis of a combination of a statistical and a transferbased system for English→Czech translation, Moses and TectoMT. We describe several techniques for inspecting such a system combination which are based both on automatic and manual evaluation. While TectoMT often produces bad translations, Moses is still able to select the good parts of them. In many cases, TectoMT provides useful novel translations which are otherwise simply unavailable to the statistical component, despite the very large training data. Our analyses confirm the expected behaviour that TectoMT helps with preserving grammatical agreements and valency requirements, but that it also improves a very diverse set of other phenomena. Interestingly, including the outputs of the transfer-based system in the phrase-based search seems to have a positive effect on the search space. Overall, we find that the components of this combination are complementary and the final system produces significantly better translations than either component by itself.
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016
Aleš Tamchyna; Roman Sudarikov; Ondrej Bojar; Alexander M. Fraser
This paper describes the phrase-based systems jointly submitted by CUNI and LMU to English-Czech and English-Romanian News translation tasks of WMT16. In contrast to previous years, we strictly limited our training data to the constraint datasets, to allow for a reliable comparison with other research systems. We experiment with using several additional models in our system, including a feature-rich discriminative model of phrasal translation.
workshop on statistical machine translation | 2015
Milos Stanojevic; Amir Kamran; Ondrej Bojar
This paper presents the results of the WMT15 Tuning Shared Task. We provided the participants of this task with a complete machine translation system and asked them to tune its internal parameters (feature weights). The tuned systems were used to translate the test set and the outputs were manually ranked for translation quality. We received 4 submissions in the English-Czech and 6 in the Czech-English translation direction. In addition, we ran 3 baseline setups, tuning the parameters with standard optimizers for BLEU score.