Marc Bernard | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marc Bernard is active.

Explore More

Publication

Featured researches published by Marc Bernard.

Pattern Recognition | 2008

Learning probabilistic models of tree edit distance

Marc Bernard; Laurent Boyer; Amaury Habrard; Marc Sebban

Nowadays, there is a growing interest in machine learning and pattern recognition for tree-structured data. Trees actually provide a suitable structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, computer music, or conversion of semi-structured data (e.g. XML documents). Many applications in these domains require the calculation of similarities over pairs of trees. In this context, the tree edit distance (ED) has been subject of investigations for many years in order to improve its computational efficiency. However, used in its classical form, the tree ED needs a priori fixed edit costs which are often difficult to tune, that leaves little room for tackling complex problems. In this paper, to overcome this drawback, we focus on the automatic learning of a non-parametric stochastic tree ED. More precisely, we are interested in two kinds of probabilistic approaches. The first one builds a generative model of the tree ED from a joint distribution over the edit operations, while the second works from a conditional distribution providing then a discriminative model. To tackle these tasks, we present an adaptation of the expectation-maximization algorithm for learning these distributions over the primitive edit costs. Two experiments are conducted. The first is achieved on artificial data and confirms the interest to learn a tree ED rather than a priori imposing edit costs; The second is applied to a pattern recognition task aiming to classify handwritten digits.

international colloquium on grammatical inference | 2006

A discriminative model of stochastic edit distance in the form of a conditional transducer

Marc Bernard; Jean-Christophe Janodet; Marc Sebban

Many real-world applications such as spell-checking or DNA analysis use the Levenshtein edit-distance to compute similarities between strings. In practice, the costs of the primitive edit operations (insertion, deletion and substitution of symbols) are generally hand-tuned. In this paper, we propose an algorithm to learn these costs. The underlying model is a probabilitic transducer, computed by using grammatical inference techniques, that allows us to learn both the structure and the probabilities of the model. Beyond the fact that the learned transducers are neither deterministic nor stochastic in the standard terminology, they are conditional, thus independant from the distributions of the input strings. Finally, we show through experiments that our method allows us to design cost functions that depend on the string context where the edit operations are used. In other words, we get kinds of context-sensitive edit distances.

european conference on machine learning | 2006

Learning stochastic tree edit distance

Marc Bernard; Amaury Habrard; Marc Sebban

Trees provide a suited structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, or conversion of tree structured documents. In this context, many applications require the calculation of similarities between tree pairs. The most studied distance is likely the tree edit distance (ED) for which improvements in terms of complexity have been achieved during the last decade. However, this classic ED usually uses a priori fixed edit costs which are often difficult to tune, that leaves little room for tackling complex problems. In this paper, we focus on the learning of a stochastic tree ED. We use an adaptation of the Expectation-Maximization algorithm for learning the primitive edit costs. We carried out series of experiments that confirm the interest to learn a tree ED rather than a priori imposing edit costs.

international colloquium on grammatical inference | 2002

Generalized Stochastic Tree Automata for Multi-relational Data Mining

Amaury Habrard; Marc Bernard; François Jacquenet

This paper addresses the problem of learning a statistical distribution of data in a relational database. Data we want to focus on are represented with trees which are a quite natural way to represent structured information. These trees are used afterwards to infer a stochastic tree automaton, using a well-known grammatical inference algorithm. We propose two extensions of this algorithm: use of sorts and generalization of the infered automaton according to a local criterion. We show on some experiments that our approach scales with large databases and both improves the predictive power of the learned model and the convergence of the learning algorithm.

artificial intelligence in medicine in europe | 2003

Multi-Relational Data Mining in Medical Databases

Amaury Habrard; Marc Bernard; François Jacquenet

This paper presents the application of a method for mining data in a multi-relational database that contains some information about patients strucked down by chronic hepatitis. Our approach may be used on any kind of multirelational database and aims at extracting probabilistic tree patterns from a database using Grammatical Inference techniques. We propose to use a representation of the database by trees in order to extract these patterns. Trees provide a natural way to represent structured information taking into account the statistical distribution of the data. In this work we try to show how they can be useful for interpreting knowledge in the medical domain.

european conference on machine learning | 2003

Improvement of the state merging rule on noisy data in probabilistic grammatical inference

Amaury Habrard; Marc Bernard; Marc Sebban

In this paper we study the influence of noise in probabilistic grammatical inference. We paradoxically bring out the idea that specialized automata deal better with noisy data than more general ones. We propose then to replace the statistical test of the ALERGIA algorithm by a more restrictive merging rule based on a test of proportion comparison. We experimentally show that this way to proceed allows us to produce larger automata that better treat noisy data, according to two different performance criteria (perplexity and distance to the target model).

international conference on tools with artificial intelligence | 2012

Generic) Packages for Logic Programs

François Jacquenet; Marc Bernard

Logic programming is a programming paradigm that has been widely adopted for software development in the domain of artificial intelligence (natural language processing, planning, solving, etc). Even if pure logic programming has been superseded by other families of programming languages (constraint logic programming, description logic, etc) it remains the core of many researches. However, while the imperative programming languages have seen the development of many concepts and tools for software development on a large scale, the logic programming languages have remained relatively impervious to such considerations and they have not really received benefits of research in the domain of software engineering. In this paper we present the p-Prolog language offering robust techniques from research in software engineering that led to the design of the ADA language. After presenting the syntax of the language, we present its semantics using transition rules extending those of the SLD-resolution.

international conference on tools with artificial intelligence | 2011

Using the H-Divergence to Prune Probabilistic Automata

Marc Bernard; Baptiste Jeudy; Jean-Philippe Peyrache; Marc Sebban; Franck Thollard

A problem usually encountered in probabilistic automata learning is the difficulty to deal with large training samples and/or wide alphabets. This is partially due to the size of the resulting Probabilistic Prefix Tree (PPT) from which state merging-based learning algorithms are generally applied. In this paper, we propose a novel method to prune PPTs by making use of the H-divergence d_H, recently introduced in the field of domain adaptation. d_H is based on the classification error made by an hypothesis learned from unlabeled examples drawn according to two distributions to compare. Through a thorough comparison with state-of-the-art divergence measures, we provide experimental evidences that demonstrate the efficiency of our method based on this simple and intuitive criterion.

TRECVID 2011 - TREC Video Retrieval Evaluation Online | 2011

IRIM at TRECVID 2011: Semantic Indexing and Instance Search

Nicolas Ballas; Benjamin Labb; Aymen Shabou; Philippe Gosselin; Miriam Redi; Marc Bernard; Jonathan Delhumeau; Boris Mansencal; Jenny Benois-Pineau; Abdelkader Haadi; Bahjat Safadi; Franck Thollard; Nadia Derbas; Liming Chen; Alexandre Beno; Patrick Lambert; Tiberius Strat; Joseph Razik; Dijana Petrovska; Andrei Stoian; Michel Crucianu

Journal of Universal Computer Science | 1997