Joseph Le Roux | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joseph Le Roux is active.

Explore More

Publication

Featured researches published by Joseph Le Roux.

Computational Linguistics | 2013

XMG: eXtensible MetaGrammar

Benoît Crabbé; Denys Duchier; Claire Gardent; Joseph Le Roux; Yannick Parmentier

In this article, we introduce eXtensible MetaGrammar (XMG), a framework for specifying tree-based grammars such as Feature-Based Lexicalized Tree-Adjoining Grammars (FB-LTAG) and Interaction Grammars (IG). We argue that XMG displays three features that facilitate both grammar writing and a fast prototyping of tree-based grammars. Firstly, XMG is fully declarative. For instance, it permits a declarative treatment of diathesis that markedly departs from the procedural lexical rules often used to specify tree-based grammars. Secondly, the XMG language has a high notational expressivity in that it supports multiple linguistic dimensions, inheritance, and a sophisticated treatment of identifiers. Thirdly, XMG is extensible in that its computational architecture facilitates the extension to other linguistic formalisms. We explain how this architecture naturally supports the design of three linguistic formalisms, namely, FB-LTAG, IG, and Multi-Component Tree-Adjoining Grammar (MC-TAG). We further show how it permits a straightforward integration of additional mechanisms such as linguistic and formal principles. To further illustrate the declarativity, notational expressivity, and extensibility of XMG, we describe the methodology used to specify an FB-LTAG for French augmented with a unification-based compositional semantics. This illustrates both how XMG facilitates the modeling of the tree fragment hierarchies required to specify tree-based grammars and of a syntax/semantics interface between semantic representations and syntactic trees. Finally, we briefly report on several grammars for French, English, and German that were implemented using XMG and compare XMG with other existing grammar specification frameworks for tree-based grammars.

MOZ'04 Proceedings of the Second international conference on Multiparadigm Programming in Mozart/Oz | 2004

The metagrammar compiler: an NLP application with a multi-paradigm architecture

Denys Duchier; Joseph Le Roux; Yannick Parmentier

The concept of metagrammar has been introduced to factorize information contained in a grammar. A metagrammar compiler can then be used to compute an actual grammar from a metagrammar. In this paper, we present a new metagrammar compiler based on 2 important concepts from logic programming, namely (1) the Warrens Abstract Machine and (2) constraints on finite set.

empirical methods in natural language processing | 2015

Foreebank: Syntactic Analysis of Customer Support Forums

Rasoul Samad Zadeh Kaljahi; Jennifer Foster; Johann Roturier; Corentin Ribeyre; Teresa Lynn; Joseph Le Roux

We present a new treebank of English and French technical forum content which has been annotated for grammatical errors and phrase structure. This double annotation allows us to empirically measure the effect of errors on parsing performance. While it is slightly easier to parse the corrected versions of the forum sentences, the errors are not the main factor in making this kind of text hard to parse.

international conference on implementation and application of automata | 2006

Lexical disambiguation with polarities and automata

Guillaume Bonfante; Joseph Le Roux; Guy Perrier

We propose a method for lexical disambiguation based on polarities for Interaction Grammars (IGs), well suited for coordination.

ACM Transactions on Speech and Language Processing | 2013

Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields

Matthieu Constant; Joseph Le Roux; Anthony Sigogne

The integration of compounds in a parsing procedure has been shown to improve accuracy in an artificial context where such expressions have been perfectly preidentified. This article evaluates two empirical strategies to incorporate such multiword units in a real PCFG-LA parsing context: (1) the use of a grammar including compound recognition, thanks to specialized annotation schemes for compounds; (2) the use of a state-of-the-art discriminative compound prerecognizer integrating endogenous and exogenous features. We show how these two strategies can be combined with word lattices representing possible lexical analyses generated by the recognizer. The proposed systems display significant gains in terms of multiword recognition and often in terms of standard parsing accuracy. Moreover, we show through an Oracle analysis that this combined strategy opens promising new research directions.

meeting of the association for computational linguistics | 2016

Dependency Parsing with Bounded Block Degree and Well-nestedness via Lagrangian Relaxation and Branch-and-Bound

Caio Corro; Joseph Le Roux; Mathieu Lacroix; Antoine Rozenknop; Roberto Wolfler Calvo

We present a novel dependency parsing method which enforces two structural properties on dependency trees: bounded block degree and well-nestedness. These properties are useful to better represent the set of admissible dependency structures in treebanks and connect dependency parsing to context-sensitive grammatical formalisms. We cast this problem as an Integer Linear Program that we solve with Lagrangian Relaxation from which we derive a heuristic and an exact method based on a Branch-and-Bound search. Experimentally, we see that these methods are efficient and competitive compared to a baseline unconstrained parser, while enforcing structural properties in all cases.

international conference on computational linguistics | 2014

LIPN: Introducing a new Geographical Context Similarity Measure and a Statistical Similarity Measure based on the Bhattacharyya coefficient

Davide Buscaldi; Jorge J. García Flores; Joseph Le Roux; Nadi Tomeh; Belém Priego Sanchez

This paper describes the system used by the LIPN team in the task 10, Multilingual Semantic Textual Similarity, at SemEval 2014, in both the English and Spanish sub-tasks. The system uses a support vector regression model, combining different text similarity measures as features. With respect to our 2013 participation, we included a new feature to take into account the geographical context and a new semantic distance based on the Bhattacharyya distance calculated on co-occurrence distributions derived from the Spanish Google Books n-grams dataset.

conference of the european chapter of the association for computational linguistics | 2006

XMG: an expressive formalism for describing tree-based grammars

Yannick Parmentier; Joseph Le Roux; Benoît Crabbé

In this paper we introduce eXtensible MetaGrammar, a system that facilitates the development of tree based grammars. This system includes both (1) a formal language adapted to the description of linguistic information and (2) a compiler for this language. It applies techniques of logic programming (e.g. Warrens Abstract Machine), thus providing an efficient and theoretically motivated framework for the processing of linguistic meta-descriptions.

Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms | 2006

A Constraint Driven Metagrammar

Joseph Le Roux; Benoît Crabbé; Yannick Parmentier

We present an operational framework allowing to express a large scale Tree Adjoining Grammar (tag) by using higher level operational constraints on tree descriptions. These constraints first meant to guarantee the well formedness of the grammatical units may also be viewed as a way to put model theoretic syntax at work through an efficient offline grammatical compilation process. Our strategy preserves tag formal properties, hence ensures a reasonable processing efficiency.

north american chapter of the association for computational linguistics | 2016

Deep Lexical Segmentation and Syntactic Parsing in the Easy-First Dependency Framework

Matthieu Constant; Joseph Le Roux; Nadi Tomeh

We explore the consequences of representing token segmentations as hierarchical structures (trees) for the task of Multiword Expression (MWE) recognition, in isolation or in combination with dependency parsing. We propose a novel representation of token segmentation as trees on tokens, resembling dependency trees. Given this new representation, we present and evaluate two different architectures to combine MWE recognition and dependency parsing in the easy-first framework: a pipeline and a joint system, both taking advantage of lexical and syntactic dimensions. We experimentally validate that MWE recognition significantly helps syntactic parsing.

Explore More