Matthieu Constant | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthieu Constant is active.

Explore More

Publication

Featured researches published by Matthieu Constant.

meeting of the association for computational linguistics | 2014

Strategies for Contiguous Multiword Expression Analysis and Dependency Parsing

Marie Candito; Matthieu Constant

In this paper, we investigate various strate- gies to predict both syntactic dependency parsing and contiguous multiword expres- sion (MWE) recognition, testing them on the dependency version of French Tree- bank (Abeille and Barrier, 2004), as in- stantiated in the SPMRL Shared Task (Seddah et al., 2013). Our work focuses on using an alternative representation of syntactically regular MWEs, which cap- tures their syntactic internal structure. We obtain a system with comparable perfor- mance to that of previous works on this dataset, but which predicts both syntactic dependencies and the internal structure of MWEs. This can be useful for capturing the various degrees of semantic composi- tionality of MWEs.

meeting of the association for computational linguistics | 2016

A Transition-Based System for Joint Lexical and Syntactic Analysis

Matthieu Constant; Joakim Nivre

We present a transition-based system that jointly predicts the syntactic structure and lexical units of a sentence by building two structures over the input words: a syntactic dependency tree and a ...

meeting of the association for computational linguistics | 2006

Outilex, a Linguistic Platform for Text Processing

Olivier Blanc; Matthieu Constant

We present Outilex, a generalist linguistic platform for text processing. The platform includes several modules implementing the main operations for text processing and is designed to use large-coverage Language Resources. These resources (dictionaries, grammars, annotated texts) are formatted into XML, in accordance with current standards. Evaluations on efficiency are given.

ACM Transactions on Speech and Language Processing | 2013

Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields

Matthieu Constant; Joseph Le Roux; Anthony Sigogne

The integration of compounds in a parsing procedure has been shown to improve accuracy in an artificial context where such expressions have been perfectly preidentified. This article evaluates two empirical strategies to incorporate such multiword units in a real PCFG-LA parsing context: (1) the use of a grammar including compound recognition, thanks to specialized annotation schemes for compounds; (2) the use of a state-of-the-art discriminative compound prerecognizer integrating endogenous and exogenous features. We show how these two strategies can be combined with word lattices representing possible lexical analyses generated by the recognizer. The proposed systems display significant gains in terms of multiword recognition and often in terms of standard parsing accuracy. Moreover, we show through an Oracle analysis that this combined strategy opens promising new research directions.

international conference natural language processing | 2008

Networking Multiword Units

Matthieu Constant; Patrick Watrin

This paper details a network infrastructure for representing and sharing multiword units. It enables connecting local networks describing linguistic semi-fixed components in the form of local grammars.

international conference natural language processing | 2002

On the Analysis of Locative Phrases with Graphs and Lexicon-Grammar: The Classifier/Proper Noun Pairing

Matthieu Constant

This paper analyses French locative prepositional phrases containing a location proper name Npr (e.g. Mediterranee) and its associated classifier Nc (e.g. mer). The (Nc, Npr) pairs are formally described with the aid of elementary sentences. We study their syntactic properties within adverbial support verb constructions and encode them in a Lexicon-Grammar Matrix. From this matrix, we build grammars in the form of graphs and evaluate their application to a journalistic corpus.

international multiconference on computer science and information technology | 2009

Real-time unsupervised classification of web documents

Anthony Sigogne; Matthieu Constant

This paper adresses the problem of clustering dynamic collections of web documents. We show an iterative algorithm based on a fine-grained keyword extraction (simple, compound words and proper nouns). Each new document inserted in the collection is either assigned to an existing class containing documents of the same topic, or assigned to a new class. After each step, when necessary, classes are refined using statistical techniques. The implementation of this algorithm was successfully integrated in an application used for Information Intelligence.

international conference on implementation and application of automata | 2007

A finite-state super-chunker

Olivier Blanc; Matthieu Constant; Patrick Watrin

Language is full of multiword unit expressions that form basic semantic units. The identification of these structures limits the combinatorial complexity induced by lexical ambiguity. In this paper, we detail an experiment that largely integrates these notions in a finite-state procedure of segmentation into super-chunks, preliminary to a parser. We show that the chunker, developped for French, reaches 92.9% precision and 98.7% recall.

Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017) | 2017

The ATILF-LLF System for Parseme Shared Task: a Transition-based Verbal Multiword Expression Tagger

Hazem Al Saied; Matthieu Constant; Marie Candito

We describe the ATILF-LLF system built for the MWE 2017 Shared Task on automatic identification of verbal multiword expressions. We participated in the closed track only, for all the 18 available languages. Our system is a robust greedy transition-based system, in which MWE are identified through a MERGE transition. The system was meant to accommodate the variety of linguistic resources provided for each language, in terms of accompanying morphological and syntactic information. Using per-MWE Fscore, the system was ranked first 1 for all but two languages (Hungarian and Romanian).

north american chapter of the association for computational linguistics | 2016

Deep Lexical Segmentation and Syntactic Parsing in the Easy-First Dependency Framework

Matthieu Constant; Joseph Le Roux; Nadi Tomeh

We explore the consequences of representing token segmentations as hierarchical structures (trees) for the task of Multiword Expression (MWE) recognition, in isolation or in combination with dependency parsing. We propose a novel representation of token segmentation as trees on tokens, resembling dependency trees. Given this new representation, we present and evaluate two different architectures to combine MWE recognition and dependency parsing in the easy-first framework: a pipeline and a joint system, both taking advantage of lexical and syntactic dimensions. We experimentally validate that MWE recognition significantly helps syntactic parsing.

Explore More