Laura Perez-Beltrachini
University of Lorraine
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Laura Perez-Beltrachini.
meeting of the association for computational linguistics | 2017
Claire Gardent; Anastasia Shimorina; Shashi Narayan; Laura Perez-Beltrachini
In this paper, we focus on how to create data-to-text corpora which can support the learning of wide-coverage micro-planners i.e., generation systems that handle lexicalisation, aggregation, surface re-alisation, sentence segmentation and referring expression generation. We start by reviewing common practice in designing training benchmarks for Natural Language Generation. We then present a novel framework for semi-automatically creating linguistically challenging NLG corpora from existing Knowledge Bases. We apply our framework to DBpedia data and compare the resulting dataset with (Wen et al., 2016)s dataset. We show that while (Wen et al., 2016)s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of generating text from KB data.
international conference on natural language generation | 2016
Emilie Colin; Claire Gardent; Yassine Mrabet; Shashi Narayan; Laura Perez-Beltrachini
With the emergence of the linked data initiative and the rapid development of RDF (Resource Description Format) datasets, several approaches have recently been proposed for generating text from RDF data (Sun and Mellish, 2006; Duma and Klein, 2013; Bontcheva and Wilks, 2004; Cimiano et al., 2013; Lebret et al., 2016). To support the evaluation and comparison of such systems, we propose a shared task on generating text from DBPedia data. The training data will consist of Data/Text pairs where the data is a set of triples extracted from DBPedia and the text is a verbalisation of these triples. In essence, the task consists in mapping data to text. Specific subtasks include sentence segmentation (how to chunk the input data into sentences), lexicalisation (of the DBPedia properties), aggregation (how to avoid repetitions) and surface realisation (how to build a syntactically correct and natural sounding text).
conference of the european chapter of the association for computational linguistics | 2014
Laura Perez-Beltrachini; Claire Gardent; Enrico Franconi
We present a natural language generation system which supports the incremental specification of ontology-based queries in natural language. Our contribution is two fold. First, we introduce a chart based surface realisation algorithm which supports the kind of incremental processing required by ontology-based querying. Crucially, this algorithm avoids confusing the end user by preserving a consistent ordering of the query elements throughout the incremental query formulation process. Second, we show that grammar based surface realisation better supports the generation of fluent, natural sounding queries than previous template-based approaches.
Natural Language Engineering | 2011
Claire Gardent; Benjamin Gottesman; Laura Perez-Beltrachini
Feature-based regular tree grammars (FRTG) can be used to generate the derivation trees of a feature-based tree adjoining grammar (FTAG). We make use of this fact to specify and implement both an FTAG-based sentence realiser and a benchmark generator for this realiser. We argue furthermore that the FRTG encoding enables us to improve on other proposals based on a grammar of TAG derivation trees in several ways. It preserves the compositional semantics that can be encoded in feature-based TAGs; it increases efficiency and restricts overgeneration; and it provides a uniform resource for generation, benchmark construction and parsing.
joint conference on lexical and computational semantics | 2016
Laura Perez-Beltrachini; Claire Gardent
A difficult task when generating text from knowledge bases (KB) consists in finding appropriate lexicalisations for KB symbols. We present an approach for lexicalis-ing knowledge base relations and apply it to DBPedia data. Our model learns low-dimensional embeddings of words and RDF resources and uses these representations to score RDF properties against candidate lexicalisations. Training our model using (i) pairs of RDF triples and automatically generated verbalisations of these triples and (ii) pairs of paraphrases extracted from various resources, yields competitive results on DBPedia data.
Computational Linguistics | 2017
Claire Gardent; Laura Perez-Beltrachini
Although there has been much work in recent years on data-driven natural language generation, little attention has been paid to the fine-grained interactions that arise during microplanning between aggregation, surface realization, and sentence segmentation. In this article, we propose a hybrid symbolic/statistical approach to jointly model the constraints regulating these interactions. Our approach integrates a small handwritten grammar, a statistical hypertagger, and a surface realization algorithm. It is applied to the verbalization of knowledge base queries and tested on 13 knowledge bases to demonstrate domain independence. We evaluate our approach in several ways. A quantitative analysis shows that the hybrid approach outperforms a purely symbolic approach in terms of both speed and coverage. Results from a human study indicate that users find the output of this hybrid statistic/symbolic system more fluent than both a template-based and a purely symbolic grammar-based approach. Finally, we illustrate by means of examples that our approach can account for various factors impacting aggregation, sentence segmentation, and surface realization.
international conference on natural language generation | 2016
Rania Mohammed; Laura Perez-Beltrachini; Claire Gardent
In this paper, we introduce a content selection method where the communicative goal is to describe entities of different categories (e.g., astronauts, universities or monuments). We argue that this method provides an interesting basis both for generating descriptions of entities and for semi-automatically constructing a benchmark on which to train, test and compare data-to-text generation systems.
workshop on innovative use of nlp for building educational applications | 2012
Laura Perez-Beltrachini; Claire Gardent; German Kruszewski
international conference on computational linguistics | 2010
Claire Gardent; Laura Perez-Beltrachini
international conference on natural language generation | 2017
Claire Gardent; Anastasia Shimorina; Shashi Narayan; Laura Perez-Beltrachini