Christian Boitet
Joseph Fourier University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Christian Boitet.
international conference on computational linguistics | 2008
M. G. Abbas Malik; Christian Boitet; Pushpak Bhattacharyya
Finite-state Transducers (FST) can be very efficient to implement inter-dialectal transliteration. We illustrate this on the Hindi and Urdu language pair. FSTs can also be used for translation between surface-close languages. We introduce UIT (universal intermediate transcription) for the same pair on the basis of their common phonetic repository in such a way that it can be extended to other languages like Arabic, Chinese, English, French, etc. We describe a transliteration model based on FST and UIT, and evaluate it on Hindi and Urdu corpora.
international conference on computational linguistics | 2000
Gilles Sérasset; Christian Boitet
After 3 years of specifying the UNL (Universal Networking Language) language and prototyping deconverters from more than 12 languages and enconverters for about 4, the UNL project has opened to the community by publishing the specifications (v2.0) of the UNL language, intended to encode the meaning of NL utterances as semantic hypergraphs and to be used as a pivot representation in multilingual information and communication systems.A UNL document is an html document with special tags to delimit the utterances and their rendering in UNL and in all natural languages currently handled. UNL can be viewed as the future html of the linguistic content. It is only an interface format, leading as well to the reuse of existing NLP components as to the development of original tools in a variety of possible applications, from automatic rough enconversion for information retrieval and information gathering translation to partially interactive enconversion or deconversion for higher quality.We illustrate these points by describing an UNL-French deconverter organized as a specific localizer followed by a classical MT transfer and an existing generator.
meeting of the association for computational linguistics | 2009
Avinash Malik; Laurent Besacier; Christian Boitet; Pushpak Bhattacharyya
We report in this paper a novel hybrid approach for Urdu to Hindi transliteration that combines finite-state machine (FSM) based techniques with statistical word language model based approach. The output from the FSM is filtered with the word language model to produce the correct Hindi output. The main problem handled is the case of omission of diacritical marks from the input Urdu text. Our system produces the correct Hindi output even when the crucial information in the form of diacritic marks is absent. The approach improves the accuracy of the transducer-only approach from 50.7% to 79.1%. The results reported show that performance can be improved using a word language model to disambiguate the output produced by the transducer-only approach, especially when diacritic marks are not present in the Urdu input.
international conference on computational linguistics | 2002
Christian Boitet; Mathieu Mangeot; Gilles Sérasset
The PAPILLON project aims at creating a cooperative, free, permanent, web-oriented and personalizable environment for the development and the consultation of a multilingual lexical database. The initial motivation is the lack of dictionaries, both for humans and machines, between French and many Asian languages. In particular, although there are large F-J paper usage dictionaries, they are usable only by Japanese literates, as they never contain both original (kanji/kana) and romaji writing. This applies as well to Thai, Vietnamese, Lao, etc.
international conference on acoustics, speech, and signal processing | 2006
Laurent Besacier; Viet Bac Le; Christian Boitet; Vincent Berment
There are more than 6000 languages in the world but only a small number possess the resources required for implementation of human language technologies (HLT). Thus, HLT are mostly concerned by languages for which large resources are available or which have suddenly become of interest because of the economic or political scene. On the contrary, languages from developing countries or minorities have been less worked on in the past years. One way of improving this language divide is do more research on portability of HLT for multilingual applications. In this paper, we concentrate on speech-to-speech translation. We present here our methodology for fast development of ASR systems for under-resourced languages or, as they are called now, pi-languages (poorly equipped). We present the resources collected for Vietnamese, and the experimental results of our first Vietnamese ASR system. The current validation of our methodology for Khmer is described next. We also discuss some issues related to machine translation and present first contributions of our laboratory in this context of pi-languages
international conference on computational linguistics | 1982
Christian Boitet; Pierre Guillaume; Maurice Quézel-Ambrunaz
ARIANE-78.4 is a computer system designed to o f fe r an adequate environment for construct ing machine t rans la t ion programs, for running them, and for (humanly) rev is ing the rough t rans la t ions produced by the computer. ARIANE-78 has been operat iona l at GETA for more than 4 years now. This paper refers to version 4. I t has been used for a number of appl icat ions (russian and japanese, engl ish to french and malay, portuguese to engl ish) and has constant ly been amended to meet the needs of the users. Parts of th is system have been presented before [ 2 ,3 ,7 ,8 ] , but i t s whole has only been described in in ternal technical documents.
international conference on computational linguistics | 2009
Christian Boitet; Igor M. Boguslavskij; Jesús Cardeñosa
In a recent experiment on translating a web site into 4 languages, we have confirmed that using MT results in translators mode can reduce the human work to produce good translations of complex sentences (25 w) at a rate of 25 mn/p with all-purpose commercial MT and at 20 mn/p with lab quality MT. A subexperiment has shown that using deconversions from quality-checked interlingual representations (UNL graphs) reduced the time spent down to 10 mn/p. Reducing the considerable time now needed for producing and checking UNL graphs is possible, which leads to very good usability prospects in situations involving many target languages and allowing for interactive disambiguation of source text or correction of interlingua. An analysis of improvable aspects in both interlingua design and resource building leads to a roadmap towards UNL++ in the framework of the U++C consortium, including strong mutualization (collaborative volunteer work) and open-source aspects.
international conference on computational linguistics | 1980
Christian Boitet; Philippe Chatelin; P. Daun Fraga
Useful automatized translation must be considered in a problem-solving setting, composed of a linguistic environment and a computer environment. We examine the facets of the problem which we believe to be essential, and try to give some paradigms along each of them. Those facets are the linguistic strategy, the programming tools, the treatment of semantics, the computer environment and the types of implementation.
international conference on computational linguistics | 2005
Valérie Bellynck; Christian Boitet; John Kenwright
The first stage of the ITOLDU project aims to facilitate technical English teaching, especially for vocabulary acquisition. We are pursuing two immediate goals: maximizing positive student contributions, even outside of the classroom, and minimizing teacher intervention. The resulting application is designed to support investigations on what can entice users to contribute collaboratively towards enriching a bilingual technical lexicon in a fertile teaching context. The second stage will be to investigate how to use ITOLDU and similar tools to elicit free (but not necessarily voluntary or even conscious) contributions to the research-oriented, linguistically very rich multilingual PAPILLON lexical database.
natural language processing and knowledge engineering | 2009
Mohammad Daoud; Christian Boitet; Kyo Kageura; Asanobu Kitamoto; Daoud Daoud; Mathieu Mangeot
We are describe the concept of dedicated Multilingual Preterminological Graphs MPGs, and some automatic approaches for constructing them by analyzing the behavior of online community users. A Multilingual Preterminological Graph is a special lexical resource that contains massive amount of terms related to a special domain, and can be used as raw material to later build a standardized terminological repository. Building such a graph is difficult using traditional approaches, as it needs huge efforts by domain specialists and terminologists. In our approach, we build such a graph by analyzing the access log files of the website of the community, and by finding the important terms that have been used to search in that website, and their association with each other. We aim at making this graph as a seed repository so multilingual volunteers can contribute. We are experimenting this approach with the Digital Silk Road Project. We have used its access log files since its beginning in 2003, and obtained an initial graph of around 116000 terms. As an application, we used this graph to obtain a preterminological multilingual database that is serving a CLIR system for the DSR project.