Djamé Seddah | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Djamé Seddah is active.

Explore More

Publication

Featured researches published by Djamé Seddah.

Computational Linguistics | 2013

Parsing morphologically rich languages: Introduction to the special issue

Reut Tsarfaty; Djamé Seddah; Sandra Kübler; Joakim Nivre

Parsing is a key task in natural language processing. It involves predicting, for each natural language sentence, an abstract representation of the grammatical entities in the sentence and the relations between these entities. This representation provides an interface to compositional semantics and to the notions of “who did what to whom.” The last two decades have seen great advances in parsing English, leading to major leaps also in the performance of applications that use parsers as part of their backbone, such as systems for information extraction, sentiment analysis, text summarization, and machine translation. Attempts to replicate the success of parsing English for other languages have often yielded unsatisfactory results. In particular, parsing languages with complex word structure and flexible word order has been shown to require non-trivial adaptation. This special issue reports on methods that successfully address the challenges involved in parsing a range of morphologically rich languages (MRLs). This introduction characterizes MRLs, describes the challenges in parsing MRLs, and outlines the contributions of the articles in the special issue. These contributions present up-to-date research efforts that address parsing in varied, cross-lingual settings. They show that parsing MRLs addresses challenges that transcend particular representational and algorithmic choices.

conference of the european chapter of the association for computational linguistics | 2009

On Statistical Parsing of French with Supervised and Semi-Supervised Strategies

Marie Candito; Benoît Crabbé; Djamé Seddah

This paper reports results on grammatical induction for French. We investigate how to best train a parser on the French Treebank (Abeille et al., 2003), viewing the task as a trade-off between generaliz-ability and interpretability. We compare, for French, a supervised lexicalized parsing algorithm with a semi-supervised un-lexicalized algorithm (Petrov et al., 2006) along the lines of (Crabbe and Candito, 2008). We report the best results known to us on French statistical parsing, that we obtained with the semi-supervised learning algorithm. The reported experiments can give insights for the task of grammatical learning for a morphologically-rich language, with a relatively limited amount of training data, annotated with a rather flat structure.

international workshop/conference on parsing technologies | 2007

Adapting WSJ-Trained Parsers to the British National Corpus using In-Domain Self-Training

Jennifer Foster; Joachim Wagner; Djamé Seddah; Josef van Genabith

We introduce a set of 1,000 gold standard parse trees for the British National Corpus (BNC) and perform a series of self-training experiments with Charniak and Johnsons reranking parser and BNC sentences. We show that retraining this parser with a combination of one million BNC parse trees (produced by the same parser) and the original WSJ training data yields improvements of 0.4% on WSJ Section 23 and 1.7% on the new BNC gold standard set.

international conference on computational linguistics | 2014

Alpage: Transition-based Semantic Graph Parsing with Syntactic Features

Corentin Ribeyre; Éric Villemonte de la Clergerie; Djamé Seddah

This paper describes the systems deployed by the ALPAGE team to participate to the SemEval-2014 Task on Broad-Coverage Semantic Dependency Parsing. We developed two transition-based dependency parsers with extended sets of actions to handle non-planar acyclic graphs. For the open track, we worked over two orthogonal axes ‐ lexical and syntactic ‐ in order to provide our models with lexical and syntactic features such as word clusters, lemmas and tree fragments of different types.

Journal of Logic and Computation | 2014

A word clustering approach to domain adaptation: Robust parsing of source and target domains

Djamé Seddah; Marie Candito; Enrique Henestroza Anguiano

We present a technique to improve out-of-domain statistical parsing by reducing lexical data sparseness in a PCFG-LA architecture. We replace ter- minal symbols with unsupervised word clusters acquired from a large news- paper corpus augmented with target-domain data. We also investigate the impact of guiding out-of-domain parsing with predicted part-of-speech tags. We provide an evaluation for French, and obtain improvements in perfor- mance for both non-technical and technical target domains. Though the im- provements over a strong baseline are slight, an interesting result is that the proposed techniques also improve parsing performance on the source do- main, contrary to techniques such as self-training, thus leading to a more ro- bust parser overall. We also describe new target domain evaluation treebanks, freely available, that comprise a total of about 3,000 annotated sentences from the medical domain, regional newspaper articles, French Europarl and French Wikipedia.

International Workshop on Evaluation of Natural Language and Speech Tool for Italian | 2012

Data Driven Lemmatization and Parsing of Italian

Djamé Seddah; Joseph Le Roux; Benoît Sagot

This paper aims at presenting some preliminary results for data driven lemmatisation for Italian. Based on a joint lemmatisation and part-of-speech tagging models, our system relies on a architecture that has already been proved successful for French. ‘Besides’ intrinsic evaluation for this task, we want to measure its usefulness and adequacy by using our system as input for the task of parsing. This approach achieves state-of-the-art parsing accuracy on unlabeled text without any gold information supplied (83.70% of F1 score in a 10-fold cross-validation setting), without requiring any prior knowledge of the language. This shows that our methodology is perfectly suitable for wide coverage parsing of Italian.

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies : August 3-4, 2017 Vancouver, Canada, 2017, ISBN 978-1-945626-70-8, págs. 243-252 | 2017

The ParisNLP entry at the ConLL UD Shared Task 2017: A Tale of a #ParsingTragedy

Éric Villemonte de la Clergerie; Benoît Sagot; Djamé Seddah

We present the ParisNLP entry at the UDCoNLL 2017 parsing shared task. In addition to the UDpipe models provided, we built our own data-driven tokenization models, sentence segmenter and lexicon- based morphological analyzers. All of these were used with a range of different parsing models (neural or not, feature-rich or not, transition or graph-based, etc.) and the best combination for each language was selected. Unfortunately, a glitch in the shared task’s Matrix led our model selector to run generic, weakly lexicalized mod- els, tailored for surprise languages, instead of our dataset-specific models. Because of this #ParsingTragedy, we officially ranked 27th, whereas our real models finally unofficially ranked 6th.

north american chapter of the association for computational linguistics | 2015

Because Syntax Does Matter: Improving Predicate-Argument Structures Parsing with Syntactic Features

Corentin Ribeyre; Éric Villemonte de la Clergerie; Djamé Seddah

Parsing full-fledged predicate-argument structures in a deep syntax framework requires graphs to be predicted. Using the DeepBank (Flickinger et al., 2012) and the Predicate-Argument Structure treebank (Miyao and Tsujii, 2005) as a test field, we show how transition-based parsers, extended to handle connected graphs, benefit from the use of topologically different syntactic features such as dependencies, tree fragments, spines or syntactic paths, bringing a much needed context to the parsing models, improving notably over long distance dependencies and elided coordinate structures. By confirming this positive impact on an accurate 2nd-order graph-based parser (Martins and Almeida, 2014), we establish a new state-of-the-art on these data sets.

north american chapter of the association for computational linguistics | 2010

Statistical Parsing of Morphologically Rich Languages (SPMRL) What, How and Whither

Reut Tsarfaty; Djamé Seddah; Yoav Goldberg; Sandra Kuebler; Yannick Versley; Marie Candito; Jennifer Foster; Ines Rehbein; Lamia Tounsi

empirical methods in natural language processing | 2013

Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages

Djamé Seddah; Reut Tsarfaty; Sandra Kübler; Marie Candito; Jinho D. Choi; Richárd Farkas; Jennifer Foster; Iakes Goenaga; Koldo Gojenola Galletebeitia; Yoav Goldberg; Spence Green; Nizar Habash; Marco Kuhlmann; Wolfgang Maier; Joakim Nivre; Adam Przepiórkowski; Ryan M. Roth; Wolfgang Seeker; Yannick Versley; Veronika Vincze; Marcin Woliński; Alina Wróblewska; Éric Villemonte de la Clergerie

Explore More