Is this you? Create Your Porfile

Patrick Ruch

École Polytechnique Fédérale de Lausanne

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Patrick Ruch is active.

Explore More

Publication

Featured researches published by Patrick Ruch.

Artificial Intelligence in Medicine | 2003

Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record

Patrick Ruch; Robert H. Baud; Antoine Geissbuhler

In this article, we show how a set of natural language processing (NLP) tools can be combined to improve the processing of clinical records. The study concentrates on improving spelling correction, which is of major importance for quality control in the electronic patient record (EPR). As first task, we report on the design of an improved interactive tool for correcting spelling errors. Unlike traditional systems, the linguistic context (both semantic and syntactic) is used to improve the correction strategy. The system is organized along three modules. Module 1 is based on a classical spelling checker, it means that it is context-independent and simply measures a string-edit-distance between a misspelled word and a list of well-formed words. Module 2 attempts to rank more relevantly the set of candidates provided by the first module using morpho-syntactic disambiguation tools. Module 3 processes words with the same part-of-speech (POS) and apply word-sense (WS) disambiguation in order to rerank the set of candidates. As second task, we show how this improved interactive spell checker can be cast as a fully automatic system by adjunction of another NLP module: a named-entity (NE) extractor, i.e. a tool able to identify words as such patient and physician names. This module is used to avoid replacement of named-entities when the system is not used in an interactive mode. Results confirm that using the linguistic context can improve interactive spelling correction, and justify the use of named-entity recognizer to conduct fully automatic spelling correction. It is concluded that NLP is mature enough to help information processing in EPR.

International Journal of Medical Informatics | 2002

Evaluating and reducing the effect of data corruption when applying bag of words approaches to medical records

Patrick Ruch; Robert H. Baud; Antoine Geissbuhler

Unlike journal corpora, which are supposed to be carefully reviewed before being published, the quality of documents in a patient record are often corrupted by mispelled words and conventional graphies or abbreviations. After a survey of the domain, the paper focuses on evaluating the effect of such corruption on an information retrieval (IR) engine. The IR system uses a classical bag of words approach, with stems as representation items and term frequency-inverse document frequency (tf-idf) as weighting schema; we pay special attention to the normalization factor. First results shows that even low corruption levels (3%) do affect retrieval effectiveness (4-7%), whereas higher corruption levels can affect retrieval effectiveness by 25%. Then, we show that the use of an improved automatic spelling correction system, applied on the corrupted collection, can almost restore the retrieval effectiveness of the engine.

international conference on computational linguistics | 2002

Using contextual spelling correction to improve retrieval effectiveness in degraded text collections

Patrick Ruch

The study presented relies on the design and evaluation of an improved IR system susceptible to cope with textual misspellings. After selecting an optimal weighting scheme for the engine, we evaluate the effect of misspellings on the retrieval effectiveness. Then, we compare the improvement brought to the engine by the adjunction of two different non-interactive spelling correction strategies: a classical one, based on a string-to-string edit distance calculus, and a contextual one, which adds linguistically-motivated features to the string distance module. The results for the latter suggest that average precision in degraded texts can be reduced to a few percents (4%).

international conference on computational linguistics | 2004

Query translation by text categorization

Patrick Ruch

We report on the development of a cross language information retrieval system, which translates user queries by categorizing these queries into terms listed in a controlled vocabulary. Unlike usual automatic text categorization systems, which rely on dataintensive models induced from large training data, our automatic text categorization tool applies data-independent classifiers: a vector-space engine and a pattern matcher are combined to improve ranking of Medical Subject Headings (MeSH). The categorizer also benefits from the availability of large thesauri, where variants of MeSH terms can be found. For evaluation, we use an English collection of MedLine records: OHSUMED. French OHSUMED queries - translated from the original English queries by domain experts- are mapped into French MeSH terms; then we use the MeSH controlled vocabulary as interlingua to translate French MeSH terms into English MeSH terms, which are finally used to query the OHSUMED document collection. The first part of the study focuses on the text to MeSH categorization task. We use a set of MedLine abstracts as input documents in order to tune the categorization system. The second part compares the performance of a machine translation-based cross language information retrieval (CLIR) system with the categorization-based system: the former results in a CLIR ratio close to 60%, while the latter achieves a ratio above 80%. A final experiment, which combines both approaches, achieves a result above 90%.

Archive | 2004