Erick Alphonse
University of Paris
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Erick Alphonse.
international conference on computational linguistics | 2004
Erick Alphonse; Sophie Aubin; Philippe Bessières; Gilles Bisson; Thierry Hamon; Sandrine Lagarrigue; Adeline Nazarenko; Alain-Pierre Manine; Claire Nédellec; Mohamed Ould Abdel Vetah; Thierry Poibeau; Davy Weissenbacher
This paper gives an overview of the Caderige project. This project involves teams from different areas (biology, machine learning, natural language processing) in order to develop highlevel analysis tools for extracting structured information from biological bibliographical databases, especially Medline. The paper gives an overview of the approach and compares it to the state of the art.
International Journal of Medical Informatics | 2009
Alain-Pierre Manine; Erick Alphonse; Philippe Bessières
INTRODUCTION Information extraction (IE) systems have been proposed in recent years to extract genic interactions from bibliographical resources. They are limited to single interaction relations, and have to face a trade-off between recall and precision, by focusing either on specific interactions (for precision), or general and unspecified interactions of biological entities (for recall). Yet, biologists need to process more complex data from literature, in order to study biological pathways. An ontology is an adequate formal representation to model this sophisticated knowledge. However, the tight integration of IE systems and ontologies is still a current research issue, a fortiori with complex ones that go beyond hierarchies. METHOD We propose a rich modeling of genic interactions with an ontology, and show how it can be used within an IE system. The ontology is seen as a language specifying a normalized representation of text. First, IE is performed by extracting instances from natural language processing (NLP) modules. Then, deductive inferences on the ontology language are completed, and new instances are derived from previously extracted ones. Inference rules are learnt with an inductive logic programming (ILP) algorithm, using the ontology as the hypothesis language, and its instantiation on an annotated corpus as the example language. Learning is set in a multi-class setting to deal with the multiple ontological relations. RESULTS We validated our approach on an annotated corpus of gene transcription regulations in the Bacillus subtilis bacterium. We reach a global recall of 89.3% and a precision of 89.6%, with high scores for the ten semantic relations defined in the ontology.
intelligent information systems | 2004
Erick Alphonse; Stan Matwin
Attribute-value based representations, standard in todays data mining systems, have a limited expressiveness. Inductive Logic Programming provides an interesting alternative, particularly for learning from structured examples whose parts, each with its own attributes, are related to each other by means of first-order predicates. Several subsets of first-order logic (FOL) with different expressive power have been proposed in Inductive Logic Programming (ILP). The challenge lies in the fact that the more expressive the subset of FOL the learner works with, the more critical the dimensionality of the learning task. The Datalog language is expressive enough to represent realistic learning problems when data is given directly in a relational database, making it a suitable tool for data mining. Consequently, it is important to elaborate techniques that will dynamically decrease the dimensionality of learning tasks expressed in Datalog, just as Feature Subset Selection (FSS) techniques do it in attribute-value learning. The idea of re-using these techniques in ILP runs immediately into a problem as ILP examples have variable size and do not share the same set of literals. We propose here the first paradigm that brings Feature Subset Selection to the level of ILP, in languages at least as expressive as Datalog. The main idea is to first perform a change of representation, which approximates the original relational problem by a multi-instance problem. The representation obtained as the result is suitable for FSS techniques which we adapted from attribute-value learning by taking into account some of the characteristics of the data due to the change of representation. We present the simple FSS proposed for the task, the requisite change of representation, and the entire method combining those two algorithms. The method acts as a filter, preprocessing the relational data, prior to the model building, which outputs relational examples with empirically relevant literals. We discuss experiments in which the method was successfully applied to two real-world domains.
international conference on tools with artificial intelligence | 2008
Alain-Pierre Manine; Erick Alphonse; Philippe Bessières
Ontologies are a well-motivated formal representation to model knowledge needed to extract and encode data from text. Yet, their tight integration with Information Extraction (IE) systems is still a research issue, a fortiori with complex ones that go beyond hierarchies. In this paper, we introduce an original architecture where IE is specified by designing an ontology, and the extraction process is seen as an Ontology Population (OP) task. Concepts and relations of the ontology define a normalized text representation. As their abstraction level is irrelevant for text extraction, we introduced a Lexical Layer (LL) along with the ontology, i.e. relations and classes at an intermediate level of normalization between raw text and concepts. On the contrary to previous IE systems, the extraction process only involves normalizing the outputs of Natural Language Processing (NLP) modules with instances of the ontology and the LL. All the remaining reasoning is left to a query module, which uses the inference rules of the ontology to derive new instances by deduction. In this context, these inference rules subsume classical extraction rules or patterns by providing access to appropriate abstraction level and domain knowledge. To acquire those rules, we adopt an Ontology Learning (OL) perspective, and automatically acquire the inference rules with relational Machine Learning (ML). Our approach is validated on a genic interaction extraction task from a Bacillus subtilis bacterium text corpus. We reach a global recall of 89.3% and a precision of 89.6%, with high scores for the ten conceptual relations in the ontology.
inductive logic programming | 2007
Erick Alphonse; Céline Rouveirol
Several upgrades of Attribute-Value learning to Inductive Logic Programming have been proposed and used successfully. However, the Top-Down Data-Driven strategy, popularised by the AQ family, has not yet been transferred to ILP: if the idea of reducing the hypothesis space by covering a seed example is utilised with systems like PROGOL, Aleph or MIO, these systems do not benefit from the associated data-driven specialisation operator. This operator is given an incorrect hypothesis hand a covered negative example eand outputs a set of hypotheses more specific than hand correct wrt e. This refinement operator is very valuable considering heuristic search problems ILP systems may encounter when crossing plateaus in relational search spaces. In this paper, we present the data-driven strategy of AQ , in terms of a lgg-based change of representation of negative examples given a positive seedexample, and show how it can be extended to ILP. We evaluate a basic implementation of AQ in the system Propal on a number of benchmark ILP datasets.
european conference on principles of data mining and knowledge discovery | 1999
Erick Alphonse; Céline Rouveirol
A number of Inductive Logic Programming (ILP) systems have addressed the problem of learning First Order Logic (FOL) discriminant definitions by first reformulating the problem expressed in a FOL framework into a attribute-value problem and then applying efficient algebraic learning techniques. The complexity of such propositionalization methods is now in the size of the reformulated problem which can be exponential. We propose a method that selectively propositionalizes the FOL training set by interleaving boolean reformulation and algebraic resolution. It avoids, as much as possible, the generation of redundant boolean examples, and still ensures that explicit correct and complete definitions are learned.
inductive logic programming | 2008
Erick Alphonse; Aomar Osmani
The feasibility of symbolic learning strongly relies on the efficiency of heuristic search in the hypothesis space. However, recent works in relational learning claimed that the phase transition phenomenon which may occur in the subsumption test during search acts as a plateau for the heuristic search, strongly hindering its efficiency. We further develop this point by proposing a learning problem generator where it is shown that top-down and bottom-up learning strategies face a plateau during search before reaching a solution. This property is ensured by the underlying CSP generator, the RB model, that we use to exhibit a phase transition of the subsumption test. In this model, the size of the current hypothesis maintained by the learner is an order parameter of the phase transition and, as it is also the control parameter of heuristic search, the learner has to face a plateau during the problem resolution. One advantage of this model is that small relational learning problems with interesting properties can be constructed and therefore can serve as a benchmark model for complete search algorithms used in learning. We use the generator to study complete informed and non-informed search algorithms for relational learning and compare their behaviour when facing a phase transition of the subsumption test. We show that this generator exhibits the pathological case where informed learners degenerate into non-informed ones.
inductive logic programming | 2004
Erick Alphonse
For the last ten years a lot of work has been devoted to propositionalization techniques in relational learning. These techniques change the representation of relational problems to attribute-value problems in order to use well-known learning algorithms to solve them. Propositionalization approaches have been successively applied to various problems but are still considered as ad hoc techniques. In this paper, we study these techniques in the larger context of macro-operators as techniques to improve the heuristic search. The macro-operator paradigm enables us to propose a unified view of propositionalization and to discuss its current limitations. We show that a whole new class of approaches can be developed in relational learning which extends the idea of changes of representation to more suited learning languages. As a first step, we propose different languages that provide a better compromise than current propositionalization techniques between the cost of building macro-operators and the cost of learning. It is known that ILP problems can be reformulated either into attribute-value or multi-instance problems. With the macro-operator approach, we see that we can target a new representation language we name multi-table. This new language is more expressive than attribute-value but is simpler than multi-instance. Moreover, it is PAC-learnable under weak constraints. Finally, we suggest that relational learning can benefit from both the problem solving and the attribute-value learning community by focusing on the design of effective macro-operator approaches.
european conference on machine learning | 2009
Erick Alphonse; Aomar Osmani
Relational Learning (RL) has aroused interest to fill the gap between efficient attribute-value learners and growing applications stored in multi-relational databases. However, current systems use general- purpose problem solvers that do not scale-up well. This is in contrast with the past decade of success in combinatorics communities where studies of random problems, in the phase transition framework, allowed to evaluate and develop better specialised algorithms able to solve real-world applications up to millions of variables. A number of studies have been proposed in RL, like the analysis of the phase transition of a NP-complete sub-problem, the subsumption test, but none has directly studied the phase transition of RL. As RL, in general, is
international conference on computational linguistics | 2010
Alain-Pierre Manine; Erick Alphonse; Philippe Bessières
{\it \Sigma}_2-hard