Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Fatiha Sadat is active.

Publication


Featured researches published by Fatiha Sadat.


international conference on computational linguistics | 2002

An approach based on multilingual thesauri and model combination for bilingual lexicon extraction

Hervé Déjean; Eric Gaussier; Fatiha Sadat

This paper focuses on exploiting different models and methods in bilingual lexicon extraction, either from parallel or comparable corpora, in specialized domains. First, a special attention is given to the use of multilingual thesauri, and different search strategies based on such thesauri are investigated. Then, a method to combine the different models for bilingual lexicon extraction is presented. Our results show that the combination of the models significantly improves results, and that the use of the hierarchical information contained in our thesaurus, UMLS/MeSH, is of primary importance. Lastly, methods for bilingual terminology extraction and thesaurus enrichment are discussed.


Proceedings of the fifth international workshop on on Information retrieval with Asian languages | 2000

Query term disambiguation for Web cross-language information retrieval using a search engine

Akira Maeda; Fatiha Sadat; Masatoshi Yoshikawa; Shunsuke Uemura

With the worldwide growth of the Internet, research on Cross-Language Information Retrieval (CLIR) is being paid much attention. Existing CLIR approaches based on query translation require parallel corpora or comparable corpora for the disambiguation of translated query terms. However, those natural language resources are not readily available. In this paper, we propose a disambiguation method for dictionary-based query translation that is independent of the availability of such scarce language resources, while achieving adequate retrieval effectiveness by utilizing Web documents as a corpus and using co-occurrence information between terms within that corpus. In the experiments, our method achieved 97% of manual translation case in terms of the average precision.


meeting of the association for computational linguistics | 2005

PORTAGE: A Phrase-Based Machine Translation System

Fatiha Sadat; Howard Johnson; Akakpo Agbago; George F. Foster; Roland Kuhn; Joel D. Martin; Aaron Tikuisis

This paper describes the participation of the Portage team at NRC Canada in the shared task of ACL 2005 Workshop on Building and Using Parallel Texts. We discuss Portage, a statistical phrase-based machine translation system, and present experimental results on the four language pairs of the shared task. First, we focus on the French-English task using multiple resources and techniques. Then we describe our contribution on the Finnish-English, Spanish-English and German-English language pairs using the provided data for the shared task.


Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages | 2003

Learning Bilingual Translations from Comparable Corpora to Cross-Language Information Retrieval: Hybrid Statistics-based and Linguistics-based Approach

Fatiha Sadat; Masatoshi Yoshikawa; Shunsuke Uemura

Recent years saw an increased interest in the use and the construction of large corpora. With this increased interest and awareness has come an expansion in the application to knowledge acquisition and bilingual terminology extraction. The present paper will seek to present an approach to bilingual lexicon extraction from non-aligned comparable corpora, combination to linguistics-based pruning and evaluations on Cross-Language Information Retrieval. We propose and explore a two-stages translation model for the acquisition of bilingual terminology from comparable corpora, disambiguation and selection of best translation alternatives on the basis of their morphological knowledge. Evaluations using a large-scale test collection on Japanese-English and different weighting schemes of SMART retrieval system confirmed the effectiveness of the proposed combination of two-stages comparable corpora and linguistics-based pruning on Cross-Language Information Retrieval.


Artificial Intelligence in Medicine | 2005

Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval

Hervé Déjean; Eric Gaussier; Jean-Michel Renders; Fatiha Sadat

OBJECTIVESnWe present in this article experiments on multi-language information extraction and access in the medical domain. For such applications, multilingual terminology plays a crucial role when working on specialized languages and specific domains.nnnMATERIAL AND METHODSnWe propose firstly a method for enriching multilingual thesauri which extracts new terms from parallel corpora, and secondly, a new approach for bilingual lexicon extraction from comparable corpora, which uses a bilingual thesaurus as a pivot. We illustrate their use in multi-language information retrieval (English/German) in the medical domains.nnnRESULTSnOur experiments show that these automatically extracted bilingual lexicons are accurate enough (85% precision for term extraction) for semi-automatically enriching mono- or bi-lingual thesauri such as the universal medical language system, and that their use in cross-language information retrieval significantly improves the retrieval performance (from 22 to 40% average precision) and clearly outperforms existing bilingual lexicon resources (both general lexicons and specialized ones).nnnCONCLUSIONnWe show in this paper first that bilingual lexicon extraction from parallel corpora in the medical domain could lead to accurate, specialized lexicons, which can be used to help enrich existing thesauri and second that bilingual lexicons extracted from comparable corpora outperform general bilingual resources for cross-language information retrieval.


meeting of the association for computational linguistics | 2003

Bilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval

Fatiha Sadat; Masatoshi Yoshikawa; Shunsuke Uemura

The present paper will seek to present an approach to bilingual lexicon extraction from non-aligned comparable corpora, phrasal translation as well as evaluations on Cross-Language Information Retrieval. A two-stages translation model is proposed for the acquisition of bilingual terminology from comparable corpora, disambiguation and selection of best translation alternatives according to their linguistics-based knowledge. Different rescoring techniques are proposed and evaluated in order to select best phrasal translation alternatives. Results demonstrate that the proposed translation model yields better translations and retrieval effectiveness could be achieved across Japanese-English language pair.


workshop on statistical machine translation | 2006

PORTAGE: with Smoothed Phrase Tables and Segment Choice Models

Howard Johnson; Fatiha Sadat; George F. Foster; Roland Kuhn; Michel Simard; Eric Joanis; Samuel Larkin

Improvements to Portage and its participation in the shared task of NAACL 2006 Workshop on Statistical Machine Translation are described. Promising ideas in phrase table smoothing and global distortion using feature-rich models are discussed as well as numerous improvements in the software base.


international acm sigir conference on research and development in information retrieval | 2003

Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

Fatiha Sadat; Masatoshi Yoshikawa; Shunsuke Uemura

This paper presents an approach to bilingual lexicon extraction from comparable corpora and evaluations on Cross-Language Information Retrieval. We explore a bi-directional extraction of bilingual terminology primarily from comparable corpora. A combined statistics-based and linguistics-based model to select best translation candidates to phrasal translation is proposed. Evaluations using a large test collection for Japanese-English revealed the proposed combination of bi-directional comparable corpora, bilingual dictionaries and transliteration, augmented with linguistics-based pruning to be highly effective in Cross-Language Information Retrieval.


database and expert systems applications | 2002

A combined statistical query term disambiguation in cross-language information retrieval

Fatiha Sadat; Akira Maeda; Masatoshi Yoshikawa; Shunsuke Uemura

The diversity of information sources and the explosive growth of the Internet worldwide are compelling evidence of a need for information retrieval that can cross language boundaries. Ambiguity from failure to translate queries is one of the major causes for large drops in effectiveness below monolingual performance, for the dictionary-based method in Cross-Language Information Retrieval. In this paper, we focus on the query translation and disambiguation, to improve the effectiveness of an information retrieval and to dramatically reduce errors such an approach normally makes. A combined statistical disambiguation method both before and after translation is proposed, to avoid the problem of wrong selection of target translations. We tested the effectiveness of the proposed disambiguation method, by an application to French-English Information Retrieval. Evaluations using TREC data collection proved a great effectiveness of the proposed disambiguation method.


cross language evaluation forum | 2001

Query Expansion Techniques for the CLEF Bilingual Track

Fatiha Sadat; Akira Maeda; Masatoshi Yoshikawa; Shunsuke Uemura

This paper evaluates the effectiveness of query translation and disambiguation as well as expansion techniques on the CLEF Collections, using the SMART Information Retrieval System. We focus on the query translation, disambiguation and methods used to improve the effectiveness of information retrieval. A dictionary-based method in combination with a statistics-based method is used to avoid the problem of translation ambiguity. In addition, two expansion strategies are tested to see whether they improve the effectiveness of information retrieval: expansion via relevance feedback before and after translation as well as expansion via domain feedback after translation. This method achieved 85.30% of the monolingual counterpart, in terms of average precision.

Collaboration


Dive into the Fatiha Sadat's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shunsuke Uemura

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Akira Maeda

Ritsumeikan University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Howard Johnson

National Research Council

View shared research outputs
Top Co-Authors

Avatar

Roland Kuhn

National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eric Joanis

National Research Council

View shared research outputs
Researchain Logo
Decentralizing Knowledge