Robert H. Baud | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert H. Baud is active.

Explore More

Publication

Featured researches published by Robert H. Baud.

Artificial Intelligence in Medicine | 2003

Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record

Patrick Ruch; Robert H. Baud; Antoine Geissbuhler

In this article, we show how a set of natural language processing (NLP) tools can be combined to improve the processing of clinical records. The study concentrates on improving spelling correction, which is of major importance for quality control in the electronic patient record (EPR). As first task, we report on the design of an improved interactive tool for correcting spelling errors. Unlike traditional systems, the linguistic context (both semantic and syntactic) is used to improve the correction strategy. The system is organized along three modules. Module 1 is based on a classical spelling checker, it means that it is context-independent and simply measures a string-edit-distance between a misspelled word and a list of well-formed words. Module 2 attempts to rank more relevantly the set of candidates provided by the first module using morpho-syntactic disambiguation tools. Module 3 processes words with the same part-of-speech (POS) and apply word-sense (WS) disambiguation in order to rerank the set of candidates. As second task, we show how this improved interactive spell checker can be cast as a fully automatic system by adjunction of another NLP module: a named-entity (NE) extractor, i.e. a tool able to identify words as such patient and physician names. This module is used to avoid replacement of named-entities when the system is not used in an interactive mode. Results confirm that using the linguistic context can improve interactive spelling correction, and justify the use of named-entity recognizer to conduct fully automatic spelling correction. It is concluded that NLP is mature enough to help information processing in EPR.

International Journal of Medical Informatics | 2000

GALEN: a third generation terminology tool to support a multipurpose national coding system for surgical procedures

Beatrice Trombert-Paviot; Jean Marie Rodrigues; J.E. Rogers; Robert H. Baud; E.J. van der Haring; Anne-Marie Rassinoux; V. Abrial; Lucienne Clavel; H. Idir

GALEN has developed a new generation of terminology tools based on a language independent concept reference model using a compositional formalism allowing computer processing and multiple reuses. During the 4th framework program project Galen-In-Use we applied the modelling and the tools to the development of a new multipurpose coding system for surgical procedures (CCAM) in France. On one hand we contributed to a language independent knowledge repository for multicultural Europe. On the other hand we support the traditional process for creating a new coding system in medicine which is very much labour consuming by artificial intelligence tools using a medically oriented recursive ontology and natural language processing. We used an integrated software named CLAW to process French professional medical language rubrics produced by the national colleges of surgeons into intermediate dissections and to the Grail reference ontology model representation. From this language independent concept model representation on one hand we generate controlled French natural language to support the finalization of the linguistic labels in relation with the meanings of the conceptual system structure. On the other hand the classification manager of third generation proves to be very powerful to retrieve the initial professional rubrics with different categories of concepts within a semantic network.

International Journal of Medical Informatics | 1999

Happy birthday DIOGENE: a hospital information system born 20 years ago

François Borst; Ron D. Appel; Robert H. Baud; Yves Ligier; Jean-Raoul Scherrer

Since its birth in 1978, DIOGENE, the Hospital Information System of Geneva University Hospital has been constantly evolving, with a major change in 1995, when migrating from a centralized to an open distributed architecture. Since a few years, the hospital had to face health policy revolution with both economical constraints and opening of the healthcare network. The Hospital Information System DIOGENE plays a significant role by integrating four axes of knowledge medico-economical context for better understanding and influencing resources consumption the whole set of patient reports and documents (reports, encoded summaries, clinical findings, images, lab data, etc.) patient-dependent knowledge, in a vision integrating time and space external knowledge bases such as Medline (patient-independent knowledge) integration of these patient-dependent and -independent knowledges in a Case-Based Reasoning format, providing on the physician desktop all relevant information for helping him to take the most appropriate adequate decision.

International Journal of Medical Informatics | 2005

UMLF: a unified medical lexicon for French.

Pierre Zweigenbaum; Robert H. Baud; Anita Burgun; Fiammetta Namer; Éric Jarrousse; Natalia Grabar; Patrick Ruch; Franck Le Duff; Jean-François Forget; Magaly Douyère; Stéfan Jacques Darmoni

Lexical resources for medical language, such as lists of words with inflectional and derivational information, are publicly available for the English lantuate with the UMLS Specialist Lexicon. The goal of the UMLF project is to pool and unify existing resources and to add extensively to them by exploiting medical terminologies and corpora, resulting in a Unified Medical Lexicon for French. We present here the current status of the project.

artificial intelligence in medicine in europe | 2003

Learning-Free Text Categorization

Patrick Ruch; Robert H. Baud; Antoine Geissbuhler

In this paper, we report on the fusion of simple retrieval strategies with thesaural resources in order to perform large-scale text categorization tasks. Unlike most related systems, which rely on training data in order to infer text-to-concept relationships, our approach can be applied with any controlled vocabulary and does not use any training data. The first classification module uses a traditional vector-space retrieval engine, which has been fine-tuned for the task, while the second classifier is based on regular variations of the concept list. For evaluation purposes, the system uses a sample of MedLine and the Medical Subject Headings (MeSH) terminology as collection of concepts. Preliminary results show that performances of the hybrid system are significantly improved as compared to each single system. For top returned concepts, the system reaches performances comparable to machine learning systems, while genericity and scalability issues are clearly in favor of the learning-free approach. We draw conclusion on the importance of hybrids strategies combining data-poor classifiers and knowledge-based terminological resources for general text mapping tasks.

International Journal of Medical Informatics | 2002

Evaluating and reducing the effect of data corruption when applying bag of words approaches to medical records

Patrick Ruch; Robert H. Baud; Antoine Geissbuhler

Unlike journal corpora, which are supposed to be carefully reviewed before being published, the quality of documents in a patient record are often corrupted by mispelled words and conventional graphies or abbreviations. After a survey of the domain, the paper focuses on evaluating the effect of such corruption on an information retrieval (IR) engine. The IR system uses a classical bag of words approach, with stems as representation items and term frequency-inverse document frequency (tf-idf) as weighting schema; we pay special attention to the normalization factor. First results shows that even low corruption levels (3%) do affect retrieval effectiveness (4-7%), whereas higher corruption levels can affect retrieval effectiveness by 25%. Then, we show that the use of an improved automatic spelling correction system, applied on the corrupted collection, can almost restore the retrieval effectiveness of the engine.

Journal of the American Medical Informatics Association | 2000

Fast Exact String Pattern-matching Algorithms Adapted to the Characteristics of the Medical Language

Christian Lovis; Robert H. Baud

OBJECTIVE The authors consider the problem of exact string pattern matching using algorithms that do not require any preprocessing. To choose the most appropriate algorithm, distinctive features of the medical language must be taken into account. The characteristics of medical language are emphasized in this regard, the best algorithm of those reviewed is proposed, and detailed evaluations of time complexity for processing medical texts are provided. DESIGN The authors first illustrate and discuss the techniques of various string pattern-matching algorithms. Next, the source code and the behavior of representative exact string pattern-matching algorithms are presented in a comprehensive manner to promote their implementation. Detailed explanations of the use of various techniques to improve performance are given. MEASUREMENTS Real-time measures of time complexity with English medical texts are presented. They lead to results distinct from those found in the computer science literature, which are typically computed with normally distributed texts. RESULTS The Boyer-Moore-Horspool algorithm achieves the best overall results when used with medical texts. This algorithm usually performs at least twice as fast as the other algorithms tested. CONCLUSION The time performance of exact string pattern matching can be greatly improved if an efficient algorithm is used. Considering the growing amount of text handled in the electronic patient record, it is worth implementing this efficient algorithm.

conference on computational natural language learning | 2000

Minimal commitment and full lexical disambiguation: balancing rules and hidden Markov Models

Patrick Ruch; Robert H. Baud; Pierrette Bouillon; Gilbert Robert

In this paper we describe the construction of a part-of-speech tagger both for medical document retrieval purposes and XP extraction. Therefore we have designed a double system: for retrieval purposes, we rely on a rule-based architecture, called minimal commitment, which is likely to be completed by a data-driven tool (HMM) when full disambiguation is necessary.

International Journal of Medical Informatics | 2007

Defining and relating biomedical terms: towards a cross-language morphosemantics-based system.

Fiammetta Namer; Robert H. Baud

This paper addresses the issue of how semantic information can be automatically assigned to compound terms, i.e. both a definition and a set of semantic relations. This is particularly crucial when elaborating multilingual databases and when developing cross-language information retrieval systems. The paper shows how morphosemantics can contribute in the constitution of multilingual lexical networks in biomedical corpora. It presents a system capable of labelling terms with morphologically related words, i.e. providing them with a definition, and grouping them according to synonymy, hyponymy and proximity relations. The approach requires the interaction of three techniques: (1) a language-specific morphosemantic parser, (2) a multilingual table defining basic relations between word roots and (3) a set of language-independent rules to draw up the list of related terms. This approach has been fully implemented for French, on an about 29,000 terms biomedical lexicon, resulting to more than 3000 lexical families. A validation of the results against a manually annotated file by experts of the domain is presented, followed by a discussion of our method.

International Journal of Medical Informatics | 2006

Recent advances in natural language processing for biomedical applications

Nigel Collier; Adeline Nazarenko; Robert H. Baud; Patrick Ruch

We survey a set a recent advances in natural language processing applied to biomedical applications, which were presented in Geneva, Switzerland, in 2004 at an international workshop. While text mining applied to molecular biology and biomedical literature can report several interesting achievements, we observe that studies applied to clinical contents are still rare. In general, we argue that clinical corpora, including electronic patient records, must be made available to fill the gap between bioinformatics and medical informatics.

Explore More