Lucia Specia
University of Sheffield
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lucia Specia.
european semantic web conference | 2007
Lucia Specia; Enrico Motta
While tags in collaborative tagging systems serve primarily an indexing purpose, facilitating search and navigation of resources, the use of the same tags by more than one individual can yield a collective classification schema. We present an approach for making explicit the semantics behind the tag space in social tagging systems, so that this collaborative organization can emerge in the form of groups of concepts and partial ontologies. This is achieved by using a combination of shallow pre-processing strategies and statistical techniques together with knowledge provided by ontologies available on the semantic web. Preliminary results on the del.icio.us and Flickr tag sets show that the approach is very promising: it generates clusters with highly related tags corresponding to concepts in ontologies and meaningful relationships among subsets of these tags can be identified.
workshop on statistical machine translation | 2014
Ondrej Bojar; Christian Buck; Christian Federmann; Barry Haddow; Philipp Koehn; Johannes Leveling; Christof Monz; Pavel Pecina; Matt Post; Herve Saint-Amand; Radu Soricut; Lucia Specia; Aleš Tamchyna
This paper presents the results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translation task, a task for run-time estimation of machine translation quality, and a metrics task. This year, 143 machine translation systems from 23 institutions were submitted to the ten translation directions in the standard translation task. An additional 6 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had four subtasks, with a total of 10 teams, submitting 57 entries
Machine Translation | 2010
Lucia Specia; Dhwaj Raj; Marco Turchi
Most evaluation metrics for machine translation (MT) require reference translations for each sentence in order to produce a score reflecting certain aspects of its quality. The de facto metrics, BLEU and NIST, are known to have good correlation with human evaluation at the corpus level, but this is not the case at the segment level. As an attempt to overcome these two limitations, we address the problem of evaluating the quality of MT as a prediction task, where reference-independent features are extracted from the input sentences and their translation, and a quality score is obtained based on models produced from training data. We show that this approach yields better correlation with human evaluation as compared to commonly used metrics, even with models trained on different MT systems, language-pairs and text domains.
workshop on statistical machine translation | 2015
Ondrej Bojar; Rajen Chatterjee; Christian Federmann; Barry Haddow; Matthias Huck; Chris Hokamp; Philipp Koehn; Varvara Logacheva; Christof Monz; Matteo Negri; Matt Post; Carolina Scarton; Lucia Specia; Marco Turchi
This paper presents the results of the WMT15 shared tasks, which included a standard news translation task, a metrics task, a tuning task, a task for run-time estimation of machine translation quality, and an automatic post-editing task. This year, 68 machine translation systems from 24 institutions were submitted to the ten translation directions in the standard translation task. An additional 7 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had three subtasks, with a total of 10 teams, submitting 34 entries. The pilot automatic postediting task had a total of 4 teams, submitting 7 entries.
meeting of the association for computational linguistics | 2016
Ondˇrej Bojar; Rajen Chatterjee; Christian Federmann; Yvette Graham; Barry Haddow; Matthias Huck; Antonio Jimeno Yepes; Philipp Koehn; Varvara Logacheva; Christof Monz; Matteo Negri; Aurélie Névéol; Mariana L. Neves; Martin Popel; Matt Post; Raphael Rubino; Carolina Scarton; Lucia Specia; Marco Turchi; Karin Verspoor; Marcos Zampieri
This paper presents the results of the WMT16 shared tasks, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks (metrics, tuning, run-time estimation of MT quality), and an automatic post-editing task and bilingual document alignment task. This year, 102 MT systems from 24 institutions (plus 36 anonymized online systems) were submitted to the 12 translation directions in the news translation task. The IT-domain task received 31 submissions from 12 institutions in 7 directions and the Biomedical task received 15 submissions systems from 5 institutions. Evaluation was both automatic and manual (relative ranking and 100-point scale assessments). The quality estimation task had three subtasks, with a total of 14 teams, submitting 39 entries. The automatic post-editing task had a total of 6 teams, submitting 11 entries.
document engineering | 2008
Sandra Maria Aluísio; Lucia Specia; Thiago Alexandre Salgueiro Pardo; Erick Galani Maziero; Renata Pontin de Mattos Fortes
In this paper we investigate the main linguistic phenomena that can make texts complex and how they could be simplified. We focus on a corpus analysis of simple account texts available on the web for Brazilian Portuguese and propose simplification strategies for this language. This study illustrates the need for text simplification to facilitate accessibility to information by poor literacy readers and potentially by people with other cognitive disabilities. It also highlights characteristics of simplification for Portuguese, which may differ from other languages. Such study consists of the first step towards building Brazilian Portuguese text simplification systems. One of the scenarios in which these systems could be used is that of reading electronic texts produced, e.g., by the Brazilian government or by relevant news agencies.
processing of the portuguese language | 2010
Lucia Specia
We address the problem of simplifying Portuguese texts at the sentence level by treating it as a “translation task”. We use the Statistical Machine Translation (SMT) framework to learn how to translate from complex to simplified sentences. Given a parallel corpus of original and simplified texts, aligned at the sentence level, we train a standard SMT system and evaluate the “translations” produced using both standard SMT metrics like BLEU and manual inspection. Results are promising according to both evaluations, showing that while the model is usually overcautious in producing simplifications, the overall quality of the sentences is not degraded and certain types of simplification operations, mainly lexical, are appropriately captured.
international joint conference on natural language processing | 2009
Shachar Mirkin; Lucia Specia; Nicola Cancedda; Ido Dagan; Marc Dymetman; Idan Szpektor
This paper addresses the task of handling unknown terms in SMT. We propose using source-language monolingual models and resources to paraphrase the source text prior to translation. We further present a conceptual extension to prior work by allowing translations of entailed texts rather than paraphrases only. A method for performing this process efficiently is presented and applied to some 2500 sentences with unknown terms. Our experiments show that the proposed approach substantially increases the number of properly translated texts.
workshop on innovative use of nlp for building educational applications | 2009
Arnaldo Candido; Erick Galani Maziero; Lucia Specia; Caroline Gasperin; Thiago Alexandre Salgueiro Pardo; Sandra Maria Aluísio
In this paper we investigate the task of text simplification for Brazilian Portuguese. Our purpose is three-fold: to introduce a simplification tool for such language and its underlying development methodology, to present an on-line authoring system of simplified text based on the previous tool, and finally to discuss the potentialities of such technology for education. The resources and tools we present are new for Portuguese and innovative in many aspects with respect to previous initiatives for other languages.
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016
Lucia Specia; Stella Frank; Khalil Sima'an; Desmond Elliott
This paper introduces and summarises the findings of a new shared task at the intersection of Natural Language Processing and Computer Vision: the generation of image descriptions in a target language, given an image and/or one or more descriptions in a different (source) language. This challenge was organised along with the Conference on Machine Translation (WMT16), and called for system submissions for two task variants: (i) a translation task, in which a source language image description needs to be translated to a target language, (optionally) with additional cues from the corresponding image, and (ii) a description generation task, in which a target language description needs to be generated for an image, (optionally) with additional cues from source language descriptions of the same image. In this first edition of the shared task, 16 systems were submitted for the translation task and seven for the image description task, from a total of 10 teams.