Colin de la Higuera | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Colin de la Higuera is active.

Explore More

Publication

Featured researches published by Colin de la Higuera.

Pattern Recognition | 2005

A bibliographical study of grammatical inference

Colin de la Higuera

The field of grammatical inference (also known as grammar induction) is transversal to a number of research areas including machine learning, formal language theory, syntactic and structural pattern recognition, computational linguistics, computational biology and speech recognition. There is no uniform literature on the subject and one can find many papers with original definitions or points of view. This makes research in this subject very hard, mainly for a beginner or someone who does not wish to become a specialist but just to find the most suitable ideas for his own research activity. The goal of this paper is to introduce a certain number of papers related with grammatical inference. Some of these papers are essential and should constitute a common background to research in the area, whereas others are specialized on particular problems or techniques, but can be of great help on specific tasks.

Machine Learning | 1997

Characteristic Sets for Polynomial Grammatical Inference

Colin de la Higuera

When concerned about efficient grammatical inference two issues are relevant: the first one is to determine the quality of the result, and the second is to try to use polynomial time and space. A typical idea to deal with the first point is to say that an algorithm performs well if it infers in the limit the correct language. The second point has led to debate about how to define polynomial time: the main definitions of polynomial inference have been proposed by Pitt and Angluin. We return in this paper to a definition proposed by Gold that requires a characteristic set of strings to exist for each grammar, and this set to be polynomial in the size of the grammar or automaton that is to be learned, where the size of the sample is the sum of the lengths of all strings it includes. The learning algorithm must also infer correctly as soon as the characteristic set is included in the data. We first show that this definition corresponds to a notion of teachability as defined by Goldman and Mathias. By adapting their teacher/learner model to grammatical inference we prove that languages given by context-free grammars, simple deterministic grammars, linear grammars and nondeterministic finite automata are not identifiable in the limit from polynomial time and data.When concerned about efficient grammatical inference two issues are relevant: the first one is to determine the quality of the result, and the second is to try to use polynomial time and space. A typical idea to deal with the first point is to say that an algorithm performs well if it infers in the limit the correct language. The second point has led to debate about how to define polynomial time: the main definitions of polynomial inference have been proposed by Pitt and Angluin. We return in this paper to a definition proposed by Gold that requires a characteristic set of strings to exist for each grammar, and this set to be polynomial in the size of the grammar or automaton that is to be learned, where the size of the sample is the sum of the lengths of all strings it includes. The learning algorithm must also infer correctly as soon as the characteristic set is included in the data. We first show that this definition corresponds to a notion of teachability as defined by Goldman and Mathias. By adapting their teacher/learner model to grammatical inference we prove that languages given by context-free grammars, simple deterministic grammars, linear grammars and nondeterministic finite automata are not identifiable in the limit from polynomial time and data.

international colloquium on grammatical inference | 2000

Computational Complexity of Problems on Probabilistic Grammars and Transducers

Francisco Casacuberta; Colin de la Higuera

Determinism plays an important role in grammatical inference. However, in practice, ambiguous grammars (and non determinism grammars in particular) are more used than determinism grammars. Computing the probability of parsing a given string or its most probable parse with stochastic regular grammars can be performed in linear time. However, the problem of finding the most probable string has yet not given any satisfactory answer. In this paper we prove that the problem is NP-hard and does not allow for a polynomial time approximation scheme. The result extends to stochastic regular syntax-directed translation schemes.

Lecture Notes in Computer Science | 2000

Current Trends in Grammatical Inference

Colin de la Higuera

Grammatical inference has historically found its first theoretical results in the field of inductive inference, but its first applications in the one of Syntactic and Structural Pattern Recognition. In the mid nineties, the field emancipated and researchers from a variety of communities moved in: Computational Linguistics, Natural Language Processing, Algorithmics, Speech Recognition, Bio-Informatics, Computational Learning Theory, Machine Learning. We claim that this interaction has been fruitful and allowed in a few years the appearance of formal theoretical results establishing the quality or not of the Grammatical Inference techniques, and probably more importantly the discovery of new algorithms that can infer a variety of types of grammars and automata from heterogeneous data.

Archive | 1996

Grammatical Interference: Learning Syntax from Sentences

Laurent Miclet; Colin de la Higuera

Learning grammatical structure using statistical decision-trees.- Inductive inference from positive data: from heuristic to characterizing methods.- Unions of identifiable families of languages.- Characteristic sets for polynomial grammatical inference.- Query learning of subsequential transducers.- Lexical categorization: Fitting template grammars by incremental MDL optimization.- Selection criteria for word trigger pairs in language modeling.- Clustering of sequences using a minimum grammar complexity criterion.- A note on grammatical inference of slender context-free languages.- Learning linear grammars from structural information.- Learning of context-sensitive language acceptors through regular inference and constraint induction.- Inducing constraint grammars.- Introducing statistical dependencies and structural constraints in variable-length sequence models.- A disagreement count scheme for inference of constrained Markov networks.- Using knowledge to improve N-Gram language modelling through the MGGI methodology.- Discrete sequence prediction with commented Markov models.- Learning k-piecewise testable languages from positive data.- Learning code regular and code linear languages.- Incremental regular inference.- An incremental interactive algorithm for regular grammar inference.- Inductive logic programming for discrete event systems.- Stochastic simple recurrent neural networks.- Inferring stochastic regular grammars with recurrent neural networks.- Maximum mutual information and conditional maximum likelihood estimations of stochastic regular syntax-directed translation schemes.- Grammatical inference using Tabu Search.- Using domain information during the learning of a subsequential transducer.- Identification of DFA: Data-dependent versus data-independent algorithms.

international colloquium on grammatical inference | 1996

Characteristic sets for polynominal grammatical inference

Colin de la Higuera

international colloquium on grammatical inference | 1996

Identification of DFA: data-dependent vs data-independent algorithms

Colin de la Higuera; Jose Oncina; Enrique Vidal

Algorithms that infer deterministic finite automata from given data and that comply with the identification in the limit condition have been thoroughly tested and are in practice often preferred to elaborate heuristics. Even if there is no guarantee of identification from the available data, the existence of associated characteristic sets means that these algorithms converge towards the correct solution. In this paper we construct a framework for algorithms with this property, and consider algorithms that use the quantity of information to direct their strategy. These data dependent algorithms still identify in the limit but may require an exponential characteristic set to do so. Nevertheless preliminary practical evidence suggests that they could perform better.

Computer Vision and Image Understanding | 2011

Polynomial algorithms for subisomorphism of nD open combinatorial maps

Guillaume Damiand; Christine Solnon; Colin de la Higuera; Jean-Christophe Janodet; ímilie Samuel

Combinatorial maps describe the subdivision of objects in cells, and incidence and adjacency relations between cells, and they are widely used to model 2D and 3D images. However, there is no algorithm for comparing combinatorial maps, which is an important issue for image processing and analysis. In this paper, we address two basic comparison problems, i.e., map isomorphism, which involves deciding if two maps are equivalent, and submap isomorphism, which involves deciding if a copy of a pattern map may be found in a target map. We formally define these two problems for nD open combinatorial maps, we give polynomial time algorithms for solving them, and we illustrate their interest and feasibility for searching patterns in 2D and 3D images, as any child would aim to do when he searches Wally in Martin Handfords books.

international colloquium on grammatical inference | 2000

Identification in the Limit with Probability One of Stochastic Deterministic Finite Automata

Colin de la Higuera; Franck Thollard

The current formal proof that stochastic deterministic finite automata can be identified in the limit with probability one makes use of a simplified state-merging algorithm. We prove in this paper that the Alergia algorithm, and its extensions, which may use some blue fringe type of ordering, can also identify distributions generated by stochastic deterministic finite automata. We also give a new algorithm enabling us to identify the actual probabilities, even though in practice, the number of examples needed can still be overwhelming.

finite state methods and natural language processing | 2009

Zulu: an interactive learning competition

David Combe; Colin de la Higuera; Jean-Christophe Janodet

Active language learning is an interesting task for which theoretical results are known and several applications exist. In order to better understand what the better strategies may be, a new competition called Zulu (http://labh-curien.univ-st-etienne.fr/zulu/) is launched: participants are invited to learn deterministic finite automata from membership queries. The goal is to obtain the best classification rate from a fixed number of queries.

Explore More