Is this you? Create Your Porfile

Amaury Habrard

Centre national de la recherche scientifique

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Amaury Habrard is active.

Explore More

Publication

Featured researches published by Amaury Habrard.

international colloquium on grammatical inference | 2008

A Polynomial Algorithm for the Inference of Context Free Languages

Alexander Clark; Rémi Eyraud; Amaury Habrard

We present a polynomial algorithm for the inductive inference of a large class of context free languages, that includes all regular languages. The algorithm uses a representation which we call Binary Feature Grammars based on a set of features, capable of representing richly structured context free languages as well as some context sensitive languages. More precisely, we focus on a particular case of this representation where the features correspond to contexts appearing in the language. Using the paradigm of positive data and a membership oracle, we can establish that all context free languages that satisfy two constraints on the context distributions can be identified in the limit by this approach. The polynomial time algorithm we propose is based on a generalisation of distributional learning and uses the lattice of context occurrences. The formalism and the algorithm seem well suited to natural language and in particular to the modelling of first language acquisition.

Pattern Recognition | 2008

Learning probabilistic models of tree edit distance

Marc Bernard; Laurent Boyer; Amaury Habrard; Marc Sebban

Nowadays, there is a growing interest in machine learning and pattern recognition for tree-structured data. Trees actually provide a suitable structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, computer music, or conversion of semi-structured data (e.g. XML documents). Many applications in these domains require the calculation of similarities over pairs of trees. In this context, the tree edit distance (ED) has been subject of investigations for many years in order to improve its computational efficiency. However, used in its classical form, the tree ED needs a priori fixed edit costs which are often difficult to tune, that leaves little room for tackling complex problems. In this paper, to overcome this drawback, we focus on the automatic learning of a non-parametric stochastic tree ED. More precisely, we are interested in two kinds of probabilistic approaches. The first one builds a generative model of the tree ED from a joint distribution over the edit operations, while the second works from a conditional distribution providing then a discriminative model. To tackle these tasks, we present an adaptation of the expectation-maximization algorithm for learning these distributions over the primitive edit costs. Two experiments are conducted. The first is achieved on artificial data and confirms the interest to learn a tree ED rather than a priori imposing edit costs; The second is applied to a pattern recognition task aiming to classify handwritten digits.

conference on learning theory | 2006

Learning rational stochastic languages

François Denis; Yann Esposito; Amaury Habrard

Given a finite set of words w 1 ,..., w n independently drawn according to a fixed unknown distribution law P called a stochastic language, a usual goal in Grammatical Inference is to infer an estimate of P in some class of probabilistic models, such as Probabilistic Automata (PA). Here, we study the class S rat R (Σ) of rational stochastic languages, which consists in stochastic languages that can be generated by Multiplicity Automata (MA) and which strictly includes the class of stochastic languages generated by PA. Rational stochastic languages have minimal normal representation which may be very concise, and whose parameters can be efficiently estimated from stochastic samples. We design an efficient inference algorithm DEES which aims at building a minimal normal representation of the target. Despite the fact that no recursively enumerable class of MA computes exactly S rat Q (Σ), we show that DEES strongly identifies S rat Q (Σ) in the limit. We study the intermediary MA output by DEES and show that they compute rational series which converge absolutely and which can be used to provide stochastic languages which closely estimate the target.

algorithmic learning theory | 2010

A spectral approach for probabilistic grammatical inference on trees

Raphaël Bailly; Amaury Habrard; François Denis

We focus on the estimation of a probability distribution over a set of trees. We consider here the class of distributions computed by weighted automata - a strict generalization of probabilistic tree automata. This class of distributions (called rational distributions, or rational stochastic tree languages - RSTL) has an algebraic characterization: All the residuals (conditional) of such distributions lie in a finite-dimensional vector subspace. We propose a methodology based on Principal Components Analysis to identify this vector subspace. We provide an algorithm that computes an estimate of the target residuals vector subspace and builds a model which computes an estimate of the target distribution.

international colloquium on grammatical inference | 2006

Learning multiplicity tree automata

Amaury Habrard; Jose Oncina

In this paper, we present a theoretical approach for the problem of learning multiplicity tree automata. These automata allows one to define functions which compute a number for each tree. They can be seen as a strict generalization of stochastic tree automata since they allow to define functions over any field K. A multiplicity automaton admits a support which is a non deterministic automaton. From a grammatical inference point of view, this paper presents a contribution which is original due to the combination of two important aspects. This is the first time, as far as we now, that a learning method focuses on non deterministic tree automata which computes functions over a field. The algorithm proposed in this paper stands in Angluins exact model where a learner is allowed to use membership and equivalence queries. We show that this algorithm is polynomial in time in function of the size of the representation.

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition | 2008

Melody Recognition with Learned Edit Distances

Amaury Habrard; José M. Iñesta; David Rizo; Marc Sebban

In a music recognition task, the classification of a new melody is often achieved by looking for the closest piece in a set of already known prototypes. The definition of a relevant similarity measure becomes then a crucial point. So far, the edit distance approach with a-priori fixed operation costs has been one of the most used to accomplish the task. In this paper, the application of a probabilistic learning model to both string and tree edit distances is proposed and is compared to a genetic algorithm cost fitting approach. The results show that both learning models outperform fixed-costs systems, and that the probabilistic approach is able to describe consistently the underlying melodic similarity model.

european conference on machine learning | 2007

Learning Metrics Between Tree Structured Data: Application to Image Recognition

Laurent Boyer; Amaury Habrard; Marc Sebban

The problem of learning metrics between structured data (strings, trees or graphs) has been the subject of various recent papers. With regard to the specific case of trees, some approaches focused on the learning of edit probabilities required to compute a so-called stochastic tree edit distance. However, to reduce the algorithmic and learning constraints, the deletion and insertion operations are achieved on entire subtrees rather than on single nodes. We aim in this article at filling the gap with the learning of a more general stochastic tree edit distance where node deletions and insertions are allowed. Our approach is based on an adaptation of the EM optimization algorithm to learn parameters of a tree model. We propose an original experimental approach aiming at representing images by a tree-structured representation and then at using our learned metric in an image recognition task. Comparisons with a non learned tree edit distance confirm the effectiveness of our approach.

european conference on machine learning | 2008

SEDiL: Software for Edit Distance Learning

Laurent Boyer; Yann Esposito; Amaury Habrard; Jose Oncina; Marc Sebban

In this paper, we present SEDiL , a S oftware for E dit Di stance L earning. SEDiL is an innovative prototype implementation grouping together most of the state of the art methods [1,2,3,4] that aim to automatically learn the parameters of string and tree edit distances.

european conference on machine learning | 2006

Learning stochastic tree edit distance

Marc Bernard; Amaury Habrard; Marc Sebban

Trees provide a suited structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, or conversion of tree structured documents. In this context, many applications require the calculation of similarities between tree pairs. The most studied distance is likely the tree edit distance (ED) for which improvements in terms of complexity have been achieved during the last decade. However, this classic ED usually uses a priori fixed edit costs which are often difficult to tune, that leaves little room for tackling complex problems. In this paper, we focus on the learning of a stochastic tree ED. We use an adaptation of the Expectation-Maximization algorithm for learning the primitive edit costs. We carried out series of experiments that confirm the interest to learn a tree ED rather than a priori imposing edit costs.

arXiv: Machine Learning | 2014

Majority Vote of Diverse Classifiers for Late Fusion

Emilie Morvant; Amaury Habrard; Stéphane Ayache

In the past few years, a lot of attention has been devoted to multimedia indexing by fusing multimodal informations. Two kinds of fusion schemes are generally considered: The early fusion and the late fusion. We focus on late classifier fusion, where one combines the scores of each modality at the decision level. To tackle this problem, we investigate a recent and elegant well-founded quadratic program named MinCq coming from the machine learning PAC-Bayesian theory. MinCq looks for the weighted combination, over a set of real-valued functions seen as voters, leading to the lowest misclassification rate, while maximizing the voters diversity. We propose an extension of MinCq tailored to multimedia indexing. Our method is based on an order-preserving pairwise loss adapted to ranking that allows us to improve Mean Averaged Precision measure while taking into account the diversity of the voters that we want to fuse. We provide evidence that this method is naturally adapted to late fusion procedures and confirm the good behavior of our approach on the challenging PASCAL VOC07 benchmark.

Explore More