Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mark Fishel is active.

Publication


Featured researches published by Mark Fishel.


The Prague Bulletin of Mathematical Linguistics | 2011

Addicter: What is wrong with my translations?

Daniel Zeman; Mark Fishel; Jan Berka; Ondřej Bojar

Addicter: What Is Wrong with My Translations? We introduce Addicter, a tool for Automatic Detection and DIsplay of Common Translation ERrors. The tool allows to automatically identify and label translation errors and browse the test and training corpus and word alignments; usage of additional linguistic tools is also supported. The error classification is inspired by that of Vilar et al. (2006), although some of their higher-level categories are beyond the reach of the current version of our system. In addition to the tool itself we present a comparison of the proposed method to manually classified translation errors and a thorough evaluation of the generated alignments.


text speech and dialogue | 2011

Automatic translation error analysis

Mark Fishel; Ondřej Bojar; Daniel Zeman; Jan Berka

We propose a method of automatic identification of various error types in machine translation output. The approach is mostly based on monolingual word alignment of the hypothesis and the reference translation. In addition to common lexical errors misplaced words are also detected. A comparison to manually classified MT errors is presented. Our error classification is inspired by that of Vilar (2006; [17]), although distinguishing some of their categories is beyond the reach of the current version of our system.


Perspectives-studies in Translatology | 2013

Parallel subtitle corpora and their applications in machine translation and translatology

Lindsay Bywood; Martin Volk; Mark Fishel; Panayota Georgakopoulou

SUMAT is a project funded through the EU ICT Policy Support Programme (2011–2014). It involves four subtitling companies (InVision, DDS, Titelbild, VSI) and five technical partners (ALS, ATC, TextShuttle, University of Maribor, Vicomtech).For the SUMAT project, translated subtitles for seven language pairs have been collected. Four subtitling companies have contributed to this effort, which has so far resulted in collections numbering between 200,000 and 2 million subtitles per language pair. This paper describes the process of converting, classifying and aligning the subtitles. Conversion to a common text format and cross-language alignment were automatically done, using specially built converters, whilst classifying the subtitles according to text genre was a manual process, performed by the teams harvesting the subtitles.The resulting subtitle corpora are perfectly suited for various applications. The focus of the SUMAT project is to use them as training material for statistical machine translation systems, and this paper will report on the initial experiences with some of the language pairs. In addition, the parallel corpora may serve as input data for parallel concordancing systems. As part of the project, a small prototype has been built which shows how word-aligned parallel subtitles offer new insights for translation science.


meeting of the association for computational linguistics | 2015

Leveraging Compounds to Improve Noun Phrase Translation from Chinese and German

Xiao Pu; Laura Mascarell; Mark Fishel; Ngoc-Quang Luong; Martin Volk

This paper presents a method to improve the translation of polysemous nouns, when a previous occurrence of the noun as the head of a compound noun phrase is available in a text. The occurrences are identified through pattern matching rules, which detect XY compounds followed closely by a potentially coreferent occurrence of Y , such as “Nordwand ... Wand”. Two strategies are proposed to improve the translation of the second occurrence of Y : re-using the cached translation of Y from the XY compound, or post-editing the translation of Y using the head of the translation of XY . Experiments are performed on Chinese-toEnglish and German-to-French statistical machine translation, over the WIT3 and Text+Berg corpora respectively, with 261 XY/Y pairs each. The results suggest that while the overall BLEU scores increase only slightly, the translations of the targeted polysemous nouns are significantly improved.


The Prague Bulletin of Mathematical Linguistics | 2017

Open-Source Neural Machine Translation API Server

Sander Tars; Kaspar Papli; Dmytro Chasovskyi; Mark Fishel

Abstract We introduce an open-source implementation of a machine translation API server. The aim of this software package is to enable anyone to run their own multi-engine translation server with neural machine translation engines, supporting an open API for client applications. Besides the hub with the implementation of the client API and the translation service providers running in the background we also describe an open-source demo web application that uses our software package and implements an online translation tool that supports collecting translation quality comparisons from users.


The Prague Bulletin of Mathematical Linguistics | 2017

Visualizing Neural Machine Translation Attention and Confidence

Matīss Rikters; Mark Fishel; Ondřej Bojar

Abstract In this article, we describe a tool for visualizing the output and attention weights of neural machine translation systems and for estimating confidence about the output based on the attention. Our aim is to help researchers and developers better understand the behaviour of their NMT systems without the need for any reference translations. Our tool includes command line and web-based interfaces that allow to systematically evaluate translation outputs from various engines and experiments. We also present a web demo of our tool with examples of good and bad translations: http://ej.uz/nmt-attention.


empirical methods in natural language processing | 2015

Detecting Document-level Context Triggers to Resolve Translation Ambiguity

Laura Mascarell; Mark Fishel; Martin Volk

Most current machine translation systems translate each sentence independently, ignoring the context from previous sentences. This discourse unawareness can lead to incorrect translation of words or phrases that are ambiguous in the sentence. For example, the German term Typen in the phrase diese Typen can be translated either into English types or guys. However, knowing that it co-refers to the compound K¨ (“body types”) in the previous sentence helps to disambiguate the term and translate it into types. We propose a method of automatically detecting document-level trigger words (like K¨ orpertypen), whose presence helps to disambiguate translations of ambiguous terms. In this preliminary study we analyze the method and its limitations, and outline future work directions.


The Prague Bulletin of Mathematical Linguistics | 2010

CorporAl: a Method and Tool for Handling Overlapping Parallel Corpora

Mark Fishel; Heiki-Jaan Kaalep

CorporAl: a Method and Tool for Handling Overlapping Parallel Corpora This work introduces a method and tool for handling overlapping parallel corpora — i.e. corpora that are based on the same source material. The method is insensitive to minor changes in the text, different segmentation levels of the corpora and omitted material from either corpora. The aim is to detect matching sentence pairs and either produce combinations of the overlapping corpora or compare them and assess their quality in comparison to each other. The introduced tool enables the user to define the desired behavior when combining corpora pairs, resulting in pure comparison, maximum-size or maximum-quality versions of the combinations. We test the tool on two cases of overlapping parallel corpora and five language pairs. We also evaluate the impact of using the method on two translation systems — a phrase-based and a parsing-based one.


Läubli, Samuel; Fishel, Mark; Massey, Gary; Ehrensberger-Dow, Maureen; Volk, Martin (2013). Assessing post-editing efficiency in a realistic translation environment. In: MT Summit XIV Workshop on Post-editing Technology and Practice, Nice, 2 September 2013 - 2 September 2013, 83-91. | 2013

Assessing Post-Editing Efficiency in a Realistic Translation Environment

Samuel Läubli; Mark Fishel; Gary Massey; Maureen Ehrensberger-Dow; Martin Volk


language resources and evaluation | 2012

Terra: a Collection of Translation Error-Annotated Corpora

Mark Fishel; Ondřej Bojar; Maja Popović

Collaboration


Dive into the Mark Fishel's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ondřej Bojar

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lindsay Bywood

University College London

View shared research outputs
Top Co-Authors

Avatar

Daniel Zeman

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Jan Berka

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge