Is this you? Create Your Porfile

Jean Beney

Institut national des sciences Appliquées de Lyon

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jean Beney is active.

Explore More

Publication

Featured researches published by Jean Beney.

international andrei ershov memorial conference on perspectives of system informatics | 2003

Multi-classification of Patent Applications with Winnow

Cornelis H. A. Koster; Marc Seutter; Jean Beney

The Winnow family of learning algorithms can cope well with large numbers of features and is tolerant to variations in document length, which makes it suitable for classifying large collections of large documents, like patent applications.

patent information retrieval | 2009

Phrase-based document categorization revisited

Cornelis H. A. Koster; Jean Beney

This paper takes a fresh look at an old idea in Information Retrieval: the use of linguistically extracted phrases as terms in the automatic categorization (aka classification) of documents. Until now, there was found little or no evidence that document categorization benefits from the application of linguistics techniques. Classification algorithms using the most cleverly designed linguistical representations typically do no better than those using simply the bag-of-words representation. Shallow linguistical techniques are used routinely, but their positive effect on the accuracy is small at best. We have investigated the use of dependency triples as terms in document categorization, which are derived according to a dependency model based on the notion of aboutness. The documents are syntactically analyzed by a parser and transduced to dependency trees, which in turn are unnested into dependency triples following the aboutness-based model. In the process, various normalizing transformations are applied to enhance recall. We describe a sequence of large-scale experiments with different document representations, test collections and even languages, presenting evidence that adding such triples to the words in a bag-of-terms document representation may lead to a significant increase in the accuracy of document categorization.

international andrei ershov memorial conference on perspectives of system informatics | 2006

On the importance of parameter tuning in text categorization

Cornelis H. A. Koster; Jean Beney

Text Categorization algorithms have a large number of parameters that determine their behaviour, whose effect is not easily predicted objectively or intuitively and may very well depend on the corpus or on the document representation. Their values are usually taken over from previously published results, which may lead to less than optimal accuracy in experimenting on particular corpora. In this paper we investigate the effect of parameter tuning on the accuracy of two Text Categorization algorithms: the well-known Rocchio algorithm and the lesser-known Winnow. We show that the optimal parameter values for a specific corpus are sometimes very different from those found in literature. We show that the effect of individual parameters is corpus-dependent, and that parameter tuning can greatly improve the accuracy of both Winnow and Rocchio. We argue that the dependence of the categorization algorithms on experimentally established parameter values makes it hard to compare the outcomes of different experiments and propose the automatic determination of optimal parameters on the train set as a solution.

patent information retrieval | 2011

Phrase-Based Document Categorization

Cornelis H. A. Koster; Jean Beney; Suzan Verberne; Merijn Vogel

This chapter takes a fresh look at an old idea in Information Retrieval: the use of linguistically extracted phrases as terms in the automatic categorization of documents, and in particular the pre-classification of patent applications. In Information Retrieval, until now there was found little or no evidence that document categorization benefits from the application of linguistic techniques. Classification algorithms using the most cleverly designed linguistic representations typically did not perform better than those using simply the bag-of-words representation. We have investigated the use of dependency triples as terms in document categorization, according to a dependency model based on the notion of aboutness and using normalizing transformations to enhance recall. We describe a number of large-scale experiments with different document representations, test collections and even languages, presenting evidence that adding such triples to the words in a bag-of-terms document representation may lead to a statistically significant increase in the accuracy of document categorization.

compiler construction | 1990

STARLET: an affix-based compiler compiler designed as a logic programming system

Jean Beney; Jean-François Boulicaut

We present STARLET, a new compiler compiler which compiles Extended Affix Grammars defining a translation into an executable program : the translator. We look at its operational semantics and we focus on the points which are close to or different from Prolog procedural semantics. We discuss the two interwoven issues which are Program Reliability (due to many static checks) and Program Efficiency (optimizations at compile time). Both are reached through a systematic use of grammatical properties.

international symposium on programming language implementation and logic programming | 1991

On the borderline between grammars and programs

Cornelis H. A. Koster; Jean Beney

We describe some of the engineering considerations and trade-offs in the design of a new Compiler Description Language, CDL3. The language is based on Extended Affix Grammars, where the affix rules are used to define tree types. The execution model is deterministic and depth-first, except that part of the work can be delayed until a second pass over the implicit parse tree. It is checked statically whether the program can indeed be executed in two passes without backtracking. A simple module structure allows separate compilation in a safe way.

Archive | 2001