Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Simon Mille is active.

Publication


Featured researches published by Simon Mille.


applications of natural language to data bases | 2012

From ontology to NL: generation of multilingual user-oriented environmental reports

Nadjet Bouayad-Agha; Gerard Casamayor; Simon Mille; Marco Rospocher; Horacio Saggion; Luciano Serafini; Leo Wanner

Natural Language Generation (NLG) from knowledge bases (KBs) has repeatedly been subject of research. However, most proposals tend to have in common that they start from KBs of limited size that either already contain linguistically-oriented knowledge structures or to whose structures different ways of realization are explicitly assigned. To avoid these limitations, we propose a three layer OWL-based ontology framework in which domain, domain communication and linguistic knowledge structures are clearly separated and show how a large scale instantiation of this framework in the environmental domain serves multilingual NLG.


ACM Transactions on Accessible Computing | 2015

Making It Simplext: Implementation and Evaluation of a Text Simplification System for Spanish

Horacio Saggion; Sanja Štajner; Stefan Bott; Simon Mille; Luz Rello

The way in which a text is written can be a barrier for many people. Automatic text simplification is a natural language processing technology that, when mature, could be used to produce texts that are adapted to the specific needs of particular users. Most research in the area of automatic text simplification has dealt with the English language. In this article, we present results from the Simplext project, which is dedicated to automatic text simplification for Spanish. We present a modular system with dedicated procedures for syntactic and lexical simplification that are grounded on the analysis of a corpus manually simplified for people with special needs. We carried out an automatic evaluation of the system’s output, taking into account the interaction between three different modules dedicated to different simplification aspects. One evaluation is based on readability metrics for Spanish and shows that the system is able to reduce the lexical and syntactic complexity of the texts. We also show, by means of a human evaluation, that sentence meaning is preserved in most cases. Our results, even if our work represents the first automatic text simplification system for Spanish that addresses different linguistic aspects, are comparable to the state of the art in English Automatic Text Simplification.


international symposium on environmental software systems | 2011

Building an Environmental Information System for Personalized Content Delivery

Leo Wanner; Stefanos Vrochidis; Sara Tonelli; Jürgen Moßgraber; Harald Bosch; Ari Karppinen; Maria Myllynen; Marco Rospocher; Nadjet Bouayad-Agha; Ulrich Bügel; Gerard Casamayor; Thomas Ertl; Ioannis Kompatsiaris; Tarja Koskentalo; Simon Mille; Anastasia Moumtzidou; Emanuele Pianta; Horacio Saggion; Luciano Serafini; V. Tarvainen

Citizens are increasingly aware of the influence of environmental and meteorological conditions on the quality of their life. This results in an increasing demand for personalized environmental information, i.e., information that is tailored to citizens’ specific context and background. In this work we describe the development of an environmental information system that addresses this demand in its full complexity. Specifically, we aim at developing a system that supports submission of user generated queries related to environmental conditions. From the technical point of view, the system is tuned to discover reliable data in the web and to process these data in order to convert them into knowledge, which is stored in a dedicated repository. At run time, this information is transferred into an ontology-structured knowledge base, from which then information relevant to the specific user is deduced and communicated in the language of their preference.


ACM Transactions on Speech and Language Processing | 2012

Perspective-oriented generation of football match summaries: Old tasks, new challenges

Nadjet Bouayad-Agha; Gerard Casamayor; Simon Mille; Leo Wanner

Team sports commentaries call for techniques that are able to select content and generate wordings to reflect the affinity of the targeted reader for one of the teams. The existing works tend to have in common that they either start from knowledge sources of limited size to whose structures then different ways of realization are explicitly assigned, or they work directly with linguistic corpora, without the use of a deep knowledge source. With the increasing availability of large-scale ontologies this is no longer satisfactory: techniques are needed that are applicable to general purpose ontologies, but which still take user preferences into account. We take the best of both worlds in that we use a two-layer ontology. The first layer is composed of raw domain data modelled in an application-independent base OWL ontology. The second layer contains a rich perspective generation-motivated domain communication knowledge ontology, inferred from the base ontology. The two-layer ontology allows us to take into account user perspective-oriented criteria at different stages of generation to generate perspective-oriented commentaries. We show how content selection, discourse structuring, information structure determination, and lexicalization are driven by these criteria and how stage after stage a truly user perspective-tailored summary is generated. The viability of our proposal has been evaluated for the generation of football match summaries of the First Spanish Football League. The reported outcome of the evaluation demonstrates that we are on the right track.


north american chapter of the association for computational linguistics | 2015

Data-driven sentence generation with non-isomorphic trees

Miguel Ballesteros; Bernd Bohnet; Simon Mille; Leo Wanner

Abstract structures from which the generation naturally starts often do not contain any func- tional nodes, while surface-syntactic struc- tures or a chain of tokens in a linearized tree contain all of them. Therefore, data-driven linguistic generation needs to be able to cope with the projection between non-isomorphic structures that differ in their topology and number of nodes. So far, such a projection has been a challenge in data-driven genera- tion and was largely avoided. We present a fully stochastic generator that is able to cope with projection between non-isomorphic structures. The generator, which starts from PropBank-like structures, consists of a cas- cade of SVM-classifier based submodules that map in a series of transitions the input struc- tures onto sentences. The generator has been evaluated for English on the Penn-Treebank and for Spanish on the multi-layered Ancora- UPF corpus.


Natural Language Engineering | 2016

Data-driven deep-syntactic dependency parsing †

Miguel Ballesteros; Bernd Bohnet; Simon Mille; Leo Wanner

‘Deep-syntactic’ dependency structures that capture the argumentative, attributive and coordinative relations between full words of a sentence have a great potential for a number of NLP-applications. The abstraction degree of these structures is in between the output of a syntactic dependency parser (connected trees defined over all words of a sentence and language-specific grammatical functions) and the output of a semantic parser (forests of trees defined over individual lexemes or phrasal chunks and abstract semantic role labels which capture the frame structures of predicative elements and drop all attributive and coordinative dependencies). We propose a parser that provides deep-syntactic structures. The parser has been tested on Spanish, English and Chinese.


artificial intelligence applications and innovations | 2012

Personalized Environmental Service Orchestration for Quality of Life Improvement

Leo Wanner; Stefanos Vrochidis; Marco Rospocher; Jürgen Moßgraber; Harald Bosch; Ari Karppinen; Maria Myllynen; Sara Tonelli; Nadjet Bouayad-Agha; Gerard Casamayor; Thomas Ertl; Désirée Hilbring; Lasse Johansson; Kostas D. Karatzas; Ioannis Kompatsiaris; Tarja Koskentalo; Simon Mille; Anastasia Moumtzidou; Emanuele Pianta; Luciano Serafini; V. Tarvainen

Environmental and meteorological conditions are of utmost importance for the population, as they are strongly related to the quality of life. Citizens are increasingly aware of this importance. This awareness results in an increasing demand for environmental information tailored to their specific needs and background. We present an environmental information platform that supports submission of user queries related to environmental conditions and orchestrates results from complementary services to generate personalized suggestions. From the technical viewpoint, the system discovers and processes reliable data in the web in order to convert them into knowledge. At run time, this information is transferred into an ontology-structured knowledge base, from which then information relevant to the specific user is deduced and communicated in the language of their preference. The platform is demonstrated with real world use cases in the south area of Finland showing the impact it can have on the quality of everyday life.


international conference on natural language generation | 2014

Classifiers for data-driven deep sentence generation

Miguel Ballesteros; Simon Mille; Leo Wanner

State-of-the-art statistical sentence generators deal with isomorphic structures only. Therefore, given that semantic and syntactic structures tend to differ in their topology and number of nodes, i.e., are not isomorphic, statistical generation saw so far itself confined to shallow, syntactic generation. In this paper, we present a series of fine-grained classifiers that are essential for data-driven deep sentence generation in that they handle the problem of the projection of non-isomorphic structures.


Information Processing and Management | 2017

Using genre-specific features for patent summaries

Joan Codina-Filb; Nadjet Bouayad-Agha; Alicia Burga; Gerard Casamayor; Simon Mille; Andreas Mller; Horacio Saggion; Leo Wanner

Targeted summarization technique for patent material.Segment as intra-sentence summarization unit.Exploitation of lexical chains across the whole patent document.Full-fledged text generation techniques for summarization. Patent search is recall-driven, which goes hand in hand with at least a partial sacrifice of precision. As a consequence, patent analysts have to regularly view and examine a large amount of patents. This implies a very high workload. Interactive analysis aids that help to minimize this workload are thus of high demand. Still, these aids do not reduce the amount of the material to be examined, they only facilitate its examination. Its reduction can be achieved working with patent summaries instead of full patent documents. So far, high quality patent summaries are produced mainly manually and only a few research works address the problem of automatic patent summarization. Most often, these works either replicate the summarization metrics known from general discourse summarization or focus on the claims of a patent. However, it can be observed that neither of the strategies is adequate: general discourse state-of-the-art summarization techniques are of limited use due to the idiosyncrasies of the patent genre, and techniques that focus on claims only miss in their summaries important details provided in the other sections on the components of the invention introduced in the claims. We propose a patent summarization technique that takes the idiosyncrasies of the patent genre (such as the unbalanced distribution of the content across the different sections of a patent, excessive length of the sentences in the claims, abstract vocabulary, etc.) into account to obtain a comprehensive summary of the invention. In particular, we make use of lexical chains in the claims and in the description of the invention and of aligned claimdescription segments at the subsentential level to assess the relevance of the individual fragments of the document for the summary. The most relevant fragments are selected and merged using full-fledged natural language generation techniques.


practical applications of agents and multi agent systems | 2017

KRISTINA: A Knowledge-Based Virtual Conversation Agent

Leo Wanner; Elisabeth André; Josep Blat; Stamatia Dasiopoulou; Mireia Farrús; Thiago Fraga; Eleni Kamateri; Florian Lingenfelser; Gerard Llorach; Oriol Martinez; Georgios Meditskos; Simon Mille; Wolfgang Minker; Louisa Pragst; Dominik Schiller; Andries Stam; Ludo Stellingwerff; Federico M. Sukno; Bianca Vieru; Stefanos Vrochidis

We present an intelligent embodied conversation agent with linguistic, social and emotional competence. Unlike the vast majority of the state-of-the-art conversation agents, the proposed agent is constructed around an ontology-based knowledge model that allows for flexible reasoning-driven dialogue planning, instead of using predefined dialogue scripts. It is further complemented by multimodal communication analysis and generation modules and a search engine for the retrieval of multimedia background content from the web needed for conducting a conversation on a given topic. The evaluation of the 1st prototype of the agent shows a high degree of acceptance of the agent by the users with respect to its trustworthiness, naturalness, etc. The individual technologies are being further improved in the 2nd prototype.

Collaboration


Dive into the Simon Mille's collaboration.

Top Co-Authors

Avatar

Leo Wanner

Pompeu Fabra University

View shared research outputs
Top Co-Authors

Avatar

Alicia Burga

Pompeu Fabra University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Stefanos Vrochidis

Information Technology Institute

View shared research outputs
Top Co-Authors

Avatar

Bernd Bohnet

University of Birmingham

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge