Jean Beney
Institut national des sciences Appliquées de Lyon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jean Beney.
international andrei ershov memorial conference on perspectives of system informatics | 2003
Cornelis H. A. Koster; Marc Seutter; Jean Beney
The Winnow family of learning algorithms can cope well with large numbers of features and is tolerant to variations in document length, which makes it suitable for classifying large collections of large documents, like patent applications.
patent information retrieval | 2009
Cornelis H. A. Koster; Jean Beney
This paper takes a fresh look at an old idea in Information Retrieval: the use of linguistically extracted phrases as terms in the automatic categorization (aka classification) of documents. Until now, there was found little or no evidence that document categorization benefits from the application of linguistics techniques. Classification algorithms using the most cleverly designed linguistical representations typically do no better than those using simply the bag-of-words representation. Shallow linguistical techniques are used routinely, but their positive effect on the accuracy is small at best. We have investigated the use of dependency triples as terms in document categorization, which are derived according to a dependency model based on the notion of aboutness. The documents are syntactically analyzed by a parser and transduced to dependency trees, which in turn are unnested into dependency triples following the aboutness-based model. In the process, various normalizing transformations are applied to enhance recall. We describe a sequence of large-scale experiments with different document representations, test collections and even languages, presenting evidence that adding such triples to the words in a bag-of-terms document representation may lead to a significant increase in the accuracy of document categorization.
international andrei ershov memorial conference on perspectives of system informatics | 2006
Cornelis H. A. Koster; Jean Beney
Text Categorization algorithms have a large number of parameters that determine their behaviour, whose effect is not easily predicted objectively or intuitively and may very well depend on the corpus or on the document representation. Their values are usually taken over from previously published results, which may lead to less than optimal accuracy in experimenting on particular corpora. In this paper we investigate the effect of parameter tuning on the accuracy of two Text Categorization algorithms: the well-known Rocchio algorithm and the lesser-known Winnow. We show that the optimal parameter values for a specific corpus are sometimes very different from those found in literature. We show that the effect of individual parameters is corpus-dependent, and that parameter tuning can greatly improve the accuracy of both Winnow and Rocchio. We argue that the dependence of the categorization algorithms on experimentally established parameter values makes it hard to compare the outcomes of different experiments and propose the automatic determination of optimal parameters on the train set as a solution.
patent information retrieval | 2011
Cornelis H. A. Koster; Jean Beney; Suzan Verberne; Merijn Vogel
This chapter takes a fresh look at an old idea in Information Retrieval: the use of linguistically extracted phrases as terms in the automatic categorization of documents, and in particular the pre-classification of patent applications. In Information Retrieval, until now there was found little or no evidence that document categorization benefits from the application of linguistic techniques. Classification algorithms using the most cleverly designed linguistic representations typically did not perform better than those using simply the bag-of-words representation. We have investigated the use of dependency triples as terms in document categorization, according to a dependency model based on the notion of aboutness and using normalizing transformations to enhance recall. We describe a number of large-scale experiments with different document representations, test collections and even languages, presenting evidence that adding such triples to the words in a bag-of-terms document representation may lead to a statistically significant increase in the accuracy of document categorization.
compiler construction | 1990
Jean Beney; Jean-François Boulicaut
We present STARLET, a new compiler compiler which compiles Extended Affix Grammars defining a translation into an executable program : the translator. We look at its operational semantics and we focus on the points which are close to or different from Prolog procedural semantics. We discuss the two interwoven issues which are Program Reliability (due to many static checks) and Program Efficiency (optimizations at compile time). Both are reached through a systematic use of grammatical properties.
international symposium on programming language implementation and logic programming | 1991
Cornelis H. A. Koster; Jean Beney
We describe some of the engineering considerations and trade-offs in the design of a new Compiler Description Language, CDL3. The language is based on Extended Affix Grammars, where the affix rules are used to define tree types. The execution model is deterministic and depth-first, except that part of the work can be delayed until a second pass over the implicit parse tree. It is checked statically whether the program can indeed be executed in two passes without backtracking. A simple module structure allows separate compilation in a safe way.
Archive | 2001
Cornelis H. A. Koster; Marc Seutter; Jean Beney
text retrieval conference | 2000
Avi Arampatzis; Jean Beney; Cornelis H. A. Koster
cross-language evaluation forum | 2010
Jean Beney
SPLT | 1986
Jean Beney; Jean-François Boulicaut