Stella Frank
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stella Frank.
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016
Lucia Specia; Stella Frank; Khalil Sima'an; Desmond Elliott
This paper introduces and summarises the findings of a new shared task at the intersection of Natural Language Processing and Computer Vision: the generation of image descriptions in a target language, given an image and/or one or more descriptions in a different (source) language. This challenge was organised along with the Conference on Machine Translation (WMT16), and called for system submissions for two task variants: (i) a translation task, in which a source language image description needs to be translated to a target language, (optionally) with additional cues from the corresponding image, and (ii) a description generation task, in which a target language description needs to be generated for an image, (optionally) with additional cues from source language descriptions of the same image. In this first edition of the shared task, 16 systems were submitted for the translation task and seven for the image description task, from a total of 10 teams.
meeting of the association for computational linguistics | 2016
Desmond Elliott; Stella Frank; Khalil Sima'an; Lucia Specia
We introduce the Multi30K dataset to stimulate multilingual multimodal research. Recent advances in image description have been demonstrated on English-language datasets almost exclusively, but image description should not be limited to English. This dataset extends the Flickr30K dataset with i) German translations created by professional translators over a subset of the English descriptions, and ii) descriptions crowdsourced independently of the original English descriptions. We outline how the data can be used for multilingual image description and multimodal machine translation, but we anticipate the data will be useful for a broader range of tasks.
meeting of the association for computational linguistics | 2014
Stella Frank; Naomi H. Feldman; Sharon Goldwater
Learning phonetic categories is one of the first steps to learning a language, yet is hard to do using only distributional phonetic information. Semantics could potentially be useful, since words with different meanings have distinct phonetics, but it is unclear how many word meanings are known to infants learning phonetic categories. We show that attending to a weaker source of semantics, in the form of a distribution over topics in the current context, can lead to improvements in phonetic category learning. In our model, an extension of a previous model of joint word-form and phonetic category inference, the probability of word-forms is topic-dependent, enabling the model to find significantly better phonetic vowel categories and word-forms than a model with no semantic knowledge.
Proceedings of the First Conference on Machine Translation: Volume 2,#N# Shared Task Papers | 2016
Jan-Thorsten Peter; Tamer Alkhouli; Hermann Ney; Matthias Huck; Fabienne Braune; Alexander M. Fraser; Aleš Tamchyna; Ondrej Bojar; Barry Haddow; Rico Sennrich; Frédéric Blain; Lucia Specia; Jan Niehues; Alex Waibel; Alexandre Allauzen; Lauriane Aufrant; Franck Burlot; Elena Knyazeva; Thomas Lavergne; François Yvon; Marcis Pinnis; Stella Frank
This paper describes the joint submission of the QT21 and HimL projects for the English→Romanian translation task of the ACL 2016 First Conference on Machine Translation (WMT 2016). The submission is a system combination which combines twelve different statistical machine translation systems provided by the different groups (RWTH Aachen University, LMU Munich, Charles University in Prague, University of Edinburgh, University of Sheffield, Karlsruhe Institute of Technology, LIMSI, University of Amsterdam, Tilde). The systems are combined using RWTH’s system combination approach. The final submission shows an improvement of 1.0 BLEU compared to the best single system on newstest2016.
Topics in Cognitive Science | 2013
Stella Frank; Sharon Goldwater; Frank Keller
The acquisition of syntactic categories is a crucial step in the process of acquiring syntax. At this stage, before a full grammar is available, only surface cues are available to the learner. Previous computational models have demonstrated that local contexts are informative for syntactic categorization. However, local contexts are affected by sentence-level structure. In this paper, we add sentence type as an observed feature to a model of syntactic category acquisition, based on experimental evidence showing that pre-syntactic children are able to distinguish sentence type using prosody and other cues. The model, a Bayesian Hidden Markov Model, allows for adding sentence type in a few different ways; we find that sentence type can aid syntactic category acquisition if it is used to characterize the differences in word order between sentence types. In these models, knowledge of sentence type permits similar gains to those found by extending the local context.
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016
Cuong Hoang; Stella Frank; Khalil Sima'an
This paper describes Scorpio, the ILLCUvA Adaptation System submitted to the IT-DOMAIN translation task at WMT 2016, which participated with the language pair of English-Dutch. This system consolidates the ideas in our previous work on latent variable models for adaptation, and demonstrates their effectiveness in a competitive setting.
Proceedings of the 12th International Conference on the Evolution of Language (Evolang12) | 2018
Stella Frank; Kenny Smith
Languages with large numbers of adult learners tend to be less morphosyntactically complex than languages where adult learners are rare (Wray & Grace, 2007; Lupyan & Dale, 2010; Bentz & Winter, 2013; Trudgill, 2011). This correlation between the composition of populations and linguistic complexity is often attributed to deficiencies in adult language learning. Here we investigate an additional or alternative mechanism: rational accommodation by native speakers to non-native interlocutors. Humans have a general aptitude for reasoning about the knowledge, beliefs and motivations of other individuals, including their linguistic knowledge (e.g. Clark, 1996; Ferguson, 1981). While our interlocutors’ linguistic knowledge will often be close to our own, this may not be the case in a population with many nonnative speakers. We introduce a rational model of interactions between individuals capable of reasoning about the linguistic knowledge of others, and investigate the case of a non-native speaker interacting with an native speaker who reasons about their linguistic knowledge and accommodates accordingly. Our model shows that this accommodation mechanism can lead to the non-native speaker acquiring a language variant that is less complex than the original language. We assume a simple model in which a language consists of a distribution over linguistic variants (e.g. past tense forms). Language simplification is modelled as regularisation, whereby the most frequent variant becomes more frequent; this corresponds to, and can be measured as, entropy reduction. We model the interaction between a non-native speaker and a native speaker as interaction between two rational (Bayesian) agents. Both agents have the same initial priors and update their beliefs about the language from data in the same way, but the non-native speaker has simply seen much less data. Within an interaction, the native speaker has a parametrisable tendency to accommodate to the non-native speaker: instead of simply using their own language, they use the version of the language that they believe the non-native speaker may have acquired at this stage of their learning, given limited exposure. Importantly, the native speaker does not know exactly what data the non-native has seen. Instead, the native speaker models the non128
Natural Language Engineering | 2018
Stella Frank; Desmond Elliott; Lucia Specia
Two studies on multilingual multimodal image description provide empirical evidence towards two hypotheses at the core of the task: (i) whether target language speakers prefer descriptions generated directly in their native language, as compared to descriptions translated from a different language; (ii) the role of the image in human translation of descriptions. These results provide guidance for future work in multimodal natural language processing by firstly showing that on the whole, translations are not distinguished from native language descriptions, and secondly delineating and quantifying the information gained from the image during the human translation task.
Proceedings of the 5th Workshop on Cognitive Aspects of Computational Language Learning (CogACLL) | 2014
Stella Frank
We perform hyperparameter inference within a model of morphology learning (Goldwater et al., 2011) and find that it affects model behaviour drastically. Changing the model structure successfully avoids the unsegmented solution, but results in oversegmentation instead.
arXiv: Computation and Language | 2015
Desmond Elliott; Stella Frank; Eva Hasler