Gil Francopoulo
Centre national de la recherche scientifique
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gil Francopoulo.
international conference on computational linguistics | 2004
Laurent Romary; Susanne Salmon-Alt; Gil Francopoulo
Lexical resources are key components for applications related to human language technology. Various models of lexical resources have been designed and implemented during the last twenty years and the scientific community has now gained enough experience to design a common standard at an international level. This paper thus describes the ongoing activity within ISO/TC 37/SC 4 on LMF (Lexical Markup Framework) and shows how it can be concretely implemented for the design of an on-line morphological resource for French in the Morphalou project.
language resources and evaluation | 2009
Gil Francopoulo; Núria Bel; Monte George; Nicoletta Calzolari; Monica Monachini; Mandy Pet; Claudia Soria
Optimizing the production, maintenance and extension of lexical resources is one the crucial aspects impacting natural language processing (NLP). A second aspect involves optimizing the process leading to their integration in applications. With this respect, we believe that a consensual specification on monolingual, bilingual and multilingual lexicons can be a useful aid for the various NLP actors. Within ISO, one purpose of Lexical Markup Framework (LMF, ISO-24613) is to define a standard for lexicons that covers multilingual lexical data.
Proceedings of the Workshop on Multilingual Language Resources and Interoperability | 2006
Gil Francopoulo; Núria Bel; Monte George; Nicoletta Calzolari; Monica Monachini; Mandy Pet; Claudia Soria
Optimizing the production, maintenance and extension of lexical resources is one the crucial aspects impacting Natural Language Processing (NLP). A second aspect involves optimizing the process leading to their integration in applications. With this respect, we believe that the production of a consensual specification on multilingual lexicons can be a useful aid for the various NLP actors. Within ISO, one purpose of LMF (ISO-24613) is to define a standard for lexicons that covers multilingual data.
Natural Language Engineering | 2016
Aida Khemakhem; Bilel Gargouri; Abdelmajid Ben Hamadou; Gil Francopoulo
In this paper, we address the problem of the large coverage dictionaries of Arabic language usable both for direct human reading and automatic Natural Language Processing. For these purposes, we propose a normalized and implemented modeling, based on Lexical Markup Framework (LMF-ISO 24613) and Data Registry Category (DCR-ISO 12620), which allows a stable and well-defined interoperability of lexical resources through a unification of the linguistic concepts. Starting from the features of the Arabic language, and due to the fact that a large range of details and refinements need to be described specifically for Arabic, we follow a finely structuring strategy. Besides its richness in morphology, syntax and semantics knowledge, our model includes all the Arabic morphological patterns to generate the inflected forms from a given lemma and highlights the syntactic–semantic relations. In addition, an appropriate codification has been designed for the management of all types of relationships among lexical entries and their related knowledge. According to this model, a dictionary named El Madar 1 has been built and is now publicly available on line. The data are managed by a user-friendly Web-based lexicographical workstation. This work has not been done in isolation, but is the result of a collaborative effort by an international team mainly within the ISO network during a period of eight years.
international conference on computational linguistics | 1988
Michael Zock; Gil Francopoulo; Abdellatif Laroui
We present here a system under development, the present goals of which are to assist (a) students in inductively learning a set of rules to generate sentences in French, and (b) psychologists in gathering data on natural language learning.Instead of claiming an all-encompassing model or theory, we prefer to elaborate a tool, which is general and flexible enough to permit the testing of various theories. By controlling parameters such as initial knowledge, the nature and order of the data, we can empirically determine how each parameter affects the efficiency of learning. Our ultimate goal is the modelling of human learning by machine.Learning is viewed as problem-solving, i.e. as the creation and reduction of a search-space. By integrating the student into the process, that is, by encouraging him to ask an expert (the system) certain kinds of questions, like: can one say x ? how does one say x ? why does one say x ?, we can enhance not only the efficiency of the learning, but also our understanding of the underlying processes. By having a trace of the whole dialogue (what questions have been asked at what time), we should be able to infer the students learning strategies.
Computers and The Humanities | 1989
Michael Zock; Abdellatif Laroui; Gil Francopoulo
We describe a system under development, whose goal is to provide a “natural” environment for students learning to produce sentences in French. The learning objective is personal pronouns, the method is inductive (learning through exploration). Input of the learning component are conceptual structures (meanings) and the corresponding linguistic forms (sentences), its outputs are rules characterizing these data. The learning is dialogue based, that is to say, the student may ask certain kinds of questions such as:How does one say 〈idea〉?,Can one say 〈linguistic form〉?,Why does one say 〈linguistic form〉?, and the system answers them.By integrating the student into the process, that is, by encouraging him to build and explore a search space we hope to enhance not only his learning efficiency (what and how to learn), but also our understanding of the underlying processes. By analyzing the trace of the dialogue (what questions have been asked at what moment), we may infer the strategies a student put to use.Although the system covers far more than what is discussed here, we will restrict our discussion to a small subset of grammar, personal pronouns, which are known to be a notorious problem both in first and second language learning.
Archive | 2015
Joseph Mariani; Gil Francopoulo
We feel it is important to have a clear picture of what exists in terms of Language Resources and Evaluation (LRE) in order to be able to carry on research investigations in computational linguistics and develop language processing systems. The language coverage is especially important in order to provide technologies that can help multilingualism and protect endangered languages. It implies that one knows what is necessary and exists for some languages, detects the gaps for other languages, and finds a way to address them. In order to have access to that information, we based our study on the LRE Map, which was produced within the FLaReNet EC project. The LRE Map is built on data gathered at conferences directly from the authors, and therefore provides actual data obtained from the source, not an estimate of such data. At the time of this study, it covered 10 conferences from 2010 to 2012. We consider here Language Resources (LR) in the broad sense, including Data, Tools, Evaluation and Meta-Resources (standards, metadata, guidelines, etc.). We took into consideration the names, types, modalities and languages attached to each entry in the LRE Map. A huge amount of manual cleaning was necessary before being able to use the data. In order to check the availability of Language Resources for the various languages, we designed a software tool called “LRE Matrix” that automatically produces Language Matrices presenting the number of resources of various types that exist for various modalities for each language. We slightly modified the software code in order to also compute the number of times a Language Resource is mentioned, what we may call a “Language Resource Impact Factor” (LRIF). Given their quantitative, objective nature, our results are precious for comparing the situation of the various national and regional languages in Europe regarding the availability of Language Resources in a survey conducted within the META-NET network. We faced in our studies the need for a tedious normalization and cleaning process that showed the necessity to assign a Unique and Persistent Identifier to each Language Resource in order to identify it more easily and follow its use and changes over time, a process that requires an international coordination.
GWAI-86 und 2. Österreichische Artificial-Intelligence-Tagung | 1986
Gil Francopoulo
Our goal is to construct a natural language “understanding“ program, which integrates the syntactic/semantic processing. The present article is about syntactic parsing. Of the various algorithms proposed (Winograd), we prefer the deterministic analysis principle (see (Rady) for the justification). In order to recognize the diverse grammatical templates of the French language, the processing rules are necessarily complex.
language and technology conference | 2011
Alexander Pak; Patrick Paroubek; Amel Fraisse; Gil Francopoulo
N-gram models with a binary (or tf-idf) weighting scheme and SVM classifiers are commonly used together as a baseline approach in lots of research studies on sentiment analysis and opinion mining. Other advanced methods are used on top of this model to improve the classification accuracy, such as generation of additional features or using supplementary linguistic resources. In this paper, we show how a simple technique can improve both the overall classification accuracy and the classification of minor reviews by normalizing the terms weights in the basic bag-of-words method. Any other term selection scheme may also benefit from this improved weighting scheme, if it is based on the n-gram model. We have tested our approach on the movie review and the product review datasets in English and show that our normalization technique enhances the classification accuracy of the traditional weighting schemes. The question whether we would observe similar performance increases for other language families is still to be investigated, but our weighting scheme can easily address any other language, since it does not use any language specific resource apart from a training corpus.
international conference on computational linguistics | 2008
Anne Vilnat; Gil Francopoulo; O. Hamon; Sylvain Loiseau; Patrick Paroubek; Éric Villemonte de la Clergerie
This article presents the methodology of the PASSAGE project, aiming at syntactically annotating large corpora by composing annotations. It introduces the annotation format and the syntactic annotation specifications. It describes an important component of the methodolgy, namely an WEB-based evaluation service, deployed in the context of the first PASSAGE parser evaluation campaign.