Pierre Zweigenbaum
French Institute of Health and Medical Research
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pierre Zweigenbaum.
Computer Methods and Programs in Biomedicine | 1994
Pierre Zweigenbaum
The overall goal of MENELAS is to provide better access to the information contained in natural language patient discharge summaries, through the design and implementation of a pilot system able to access medical reports through natural languages. A first, experimental version of the MENELAS indexing prototype for French has been assembled. Its function is to encode free text PDSs into both an internal representation and ICD-9-CM nomenclature codes. A preliminary evaluation shows the potential for reasonable coverage and precision. The MENELAS prototype will be enhanced and extended into a pilot system which will be tested in two hospital sites.
Information Processing and Management | 1992
Marc Cavazza; Pierre Zweigenbaum
Abstract We explore the issue of extracting both explicit and implicit information from narrative technical reports through knowledge-based free text understanding. We rely on the assumption that whereas technical texts convey much implicit information, such information can be recovered through natural language analysis by building and reasoning on a model of the situation described, if both linguistic and detailed world knowledge are provided to the system. We evaluated the feasibility of this approach by designing and testing a prototype performing information extraction from clinical record sentences in a restricted medical domain: thyroid cancer care. This prototype was fully implemented and was tested on actual sentences. We present the natural language processing strategy adopted in our system with emphasis on knowledge use, as well as the preliminary results obtained.
international conference on conceptual structures | 1992
Jacques Bouaud; Pierre Zweigenbaum
In this paper, we study how several aspects of the Conceptual Graph theory can be implemented using the pattern-matching mechanisms of production systems. Usually, standard pattern matching applies to arbitrary data that, unlike CGs, do not rely on a particular theory. Reconstructions of Conceptual Graphs in terms of basic graphs have been proposed in the literature. We show that K, a graph representation language with “high-level” (rule-based) graph manipulation facilities, allows an elegant implementation of these proposals. We show how the CG projection is reconstructed from standard pattern matching. Such a mechanism provides the user with graph retrieval facilities. Moreover, Ks inherent features, such as forward reasoning rules, are gracefully transferred to the resulting CG implementation with no further effort. The result is a production system that operates within the CG theory thus providing the basis for a flexible CG processor.
Revue Dintelligence Artificielle | 2004
Pierre Zweigenbaum
LUMLS® (Unified Medical Language System®), que lon pourrait traduire par « Systeme dunification de la langue medicale », est un produit terminologique extremement riche, construit de facon pragmatique, et que lon peut apprehender de facon multiple. Nous donnons un apercu de ce quest lUMLS et mettons laccent sur deux de ses aspects potentiellement antinomiques: sa relation aux ontologies et sa relation a la langue.
Biomedical Informatics Insights | 2013
Pierre Zweigenbaum; Thomas Lavergne; Natalia Grabar; Thierry Hamon; Sophie Rosset; Cyril Grouin
Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system are often combined with data-driven systems. We present here a case study where an existing expert-based medical entity recognition system, Ogmios, is combined with a data-driven system, Caramba, based on a linear-chain Conditional Random Field (CRF) classifier. Our case study specifically highlights the risk of overfitting incurred by an expert-based system. We observe that it prevents the combination of the 2 systems from obtaining improvements in precision, recall, or F-measure, and analyze the underlying mechanisms through a post-hoc feature-level analysis. Wrapping the expert-based system alone as attributes input to a CRF classifier does boost its F-measure from 0.603 to 0.710, bringing it on par with the data-driven system. The generalization of this method remains to be further investigated.
artificial intelligence in medicine in europe | 2003
Pierre Zweigenbaum; Natalia Grabar
Morphological knowledge (inflection, derivation, compounds) is useful for medical language processing. Some is available for medical English in the UMLS Specialist Lexicon, but not for the French language. Large corpora of medical texts can nowadays be obtained from the Web. We propose here a method, based on the cooccurrence of formally similar words, which takes advantage of such a corpus to learn morphological knowledge for French medical words. The relations obtained before filtering have an average precision of 75.6% after 5,000 word pairs. Detailed examination of the results obtained on a sample of 376 French SNOMED anatomy nouns shows that 91–94% of the proposed derived adjectives are correct, that 36% of the nouns receive a correct adjective, and that this method can add 41% more derived adjectives than SNOMED already specifies. We discuss these results and propose directions for improvement.
Proceedings of the 4th BioNLP Shared Task Workshop | 2016
Estelle Chaix; Bertrand Dubreucq; Abdelhak Fatihi; Dialekti Valsamou; Robert Bossy; Mouhamadou Ba; Louise Deléger; Pierre Zweigenbaum; Philippe Bessières; Loïc Lepiniec; Claire Nédellec
This paper presents the SeeDev Task of the BioNLP Shared Task 2016. The purpose of the SeeDev Task is the extraction from scientific articles of the descriptions of genetic and molecular mechanisms involved in seed development of the model plant, Arabidopsis thaliana. The SeeDev task consists in the extraction of many different event types that involve a wide range of entity types so that they accurately reflect the complexity of the biological mechanisms. The corpus is composed of paragraphs selected from the full-texts of relevant scientific articles. In this paper, we describe the organization of the SeeDev task, the corpus characteristics, and the metrics used for the evaluation of participant systems. We analyze and discuss the final results of the seven participant systems to the test. The best F-score is 0.432, which is similar to the scores achieved in similar tasks on molecular biology.
international conference on computational linguistics | 1990
Pierre Zweigenbaum; Marc Cavazza
We present here the current prototype of the text understanding system HELENE. The objective of this system is to achieve a deep understanding of small reports dealing with a restricted domain. Sentence understanding builds a model of the state of the world described, through the application of several knowledge modules: (i) LFG parsing, (ii) syntactic disambiguation based on lexical entry semantic components, (iii) assembly of semantic components and instantiation of domain entities, and (iv) construction of a world model through activation of common sense and domain knowledge.
Applied Artificial Intelligence | 1994
Marc Cavazza; Pierre Zweigenbaum
We explored the problem of achieving in-depth understanding of natural language sentences from narrative technical reports through knowledge-based free text understanding. We rely on the assumption that texts in an expert domain convey much implicit information, which can be recovered by building and reasoning on a model of the situation described with the help of both linguistic and detailed world knowledge. We describe a two-step approach to semantic analysis: the first step assembles a conceptual representation of a sentence and deals with linguistic issues; the second step actually builds and runs the situational model and is totally dedicated to representation and inference. We evaluated this approach by designing a research prototype that processes sentences from clinical narratives in a medical specialty. This prototype was fully implemented and was tested on actual sentences. We hereby give a detailed account of this implementation as well as the first results obtained.
BioNLP 2017 Workshop, Association for Computational Linguistics | 2017
Arnaud Ferré; Pierre Zweigenbaum; Claire Nédellec
We propose in this paper a semisupervised method for labeling terms of texts with concepts of a domain ontology. The method generates continuous vector representations of complex terms in a semantic space structured by the ontology. The proposed method relies on a distributional semantics approach, which generates initial vectors for each of the extracted terms. Then these vectors are embedded in the vector space constructed from the structure of the ontology. This embedding is carried out by training a linear model. Finally, we apply a cosine similarity to determine the proximity between vectors of terms and vectors of concepts and thus to assign ontology labels to terms. We have evaluated the quality of these representations for a normalization task by using the concepts of an ontology as semantic labels. Normalization of terms is an important step to extract a part of the information contained in texts, but the vector space generated might find other applications. The performance of this method isncomparable to that of the state of the art for this task of standardization, opening up encouraging prospects.