Evelyne Tzoukermann | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Evelyne Tzoukermann is active.

Explore More

Publication

Featured researches published by Evelyne Tzoukermann.

Archive | 1999

NLP for Term Variant Extraction: Synergy Between Morphology, Lexicon, and Syntax

Christian Jacquemin; Evelyne Tzoukermann

We present a natural language processing (NLP) approach to automatic indexing over controlled vocabulary which accounts for term variation. The approach combines a part of speech tagger, a generator of morphologically related forms, and a shallow transformational parser. The system is applied to the French language; it is trained on newspaper articles and tested on scientific literature.

international acm sigir conference on research and development in information retrieval | 1999

Information retrieval based on context distance and morphology

Hongyan Jing; Evelyne Tzoukermann

We present an approach to information retrieval based on context distance and morphology. Context distance is a measure we use to assess the closeness of word meanings. This context distance model measures semantic distances between words using the local contexts of words within a single document as well as the lexical co-occurrence information in the set of documents to be retrieved. We also propose to integrate the context distance model with morphological analysis in determining word similarity so that the two can enhance each other. Using the standard vector-space model, we evaluated the proposed method on a subset of TREC-4 corpus (AP88 and AP90 collection, 158,240 documents, 49 queries). Results show that this method improves the 11-point average precision by 8.6%.

international conference on acoustics, speech, and signal processing | 1992

A speech understanding system based on statistical representation of semantics

Roberto Pieraccini; Evelyne Tzoukermann; Zakhar Gorelov; Jean-Luc Gauvain; Esther Levin; Chin-Hui Lee; Jay G. Wilpon

An understanding system, designed for both speech and text input, has been implemented based on statistical representation of task specific semantic knowledge. The core of the system is the conceptual decoder, which extracts the words and their association to the conceptual structure of the task directly from the acoustic signal. The conceptual information, which is also used to clarify the English sentences, is encoded following a statistical paradigm. A template generator and an SQL (structured query language) translator process the sentence and produce SQL code for querying a relational database. Results of the system on the official DARPA test are given.<<ETX>>

meeting of the association for computational linguistics | 1997

Expansion of Multi- Word Terms for Indexing and Retrieval Using Morphology and Syntax

Christian Jacquemin; Judith L. Klavans; Evelyne Tzoukermann

A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger, a derivational morphological processor for analysis and generation, and a unification-based shallow-level parser using transformational rules over syntactic patterns. The contribution of this research is the successful combination of parsing over a seed term list coupled with derivational morphology to achieve greater coverage of multi-word terms for indexing and retrieval. Final results are evaluated for precision and recall, and implications for indexing and retrieval are discussed.

international conference on computational linguistics | 1990

A finite-state morphological processor for Spanish

Evelyne Tzoukermann; Mark Liberman

A finite transducer that processes Spanish inflectional and derivational morphology is presented. The system handles both generation and analysis of tens of millions inflected forms. Lexical and surface (orthographic) representations of the words are linked by a program that interprets a finite directed graph whose arcs are labelled by n-tuples of strings. Each of about 55,000 base forms requires at least one are in the graph. Representing the inflectional and derivational possibilities for these forms imposed an overhead of only about 3000 additional arcs, of which about 2500 represent (phonologically predictable) stem allomorphy, so that we pay a storage price of about 5% for compiling these forms offline. A simple interpreter for the resulting automaton processes several hundred words per second on a Sun4.

conference on computational natural language learning | 2001

Combining linguistic and machine learning techniques for email summarization

Smaranda Muresan; Evelyne Tzoukermann; Judith L. Klavans

This paper shows that linguistic techniques along with machine learning can extract high quality noun phrases for the purpose of providing the gist or summary of email messages. We describe a set of comparative experiments using several machine learning algorithms for the task of salient noun phrase extraction. Three main conclusions can be drawn from this study: (i) the modifiers of a noun phrase can be semantically as important as the head for the task of gisting, (ii) linguistic filtering improves the performance of machine learning algorithms, (iii) a combination of classifiers improves accuracy.

Machine Translation | 1995

Combining corpus and machine-readable dictionary data for building bilingual lexicons

Judith L. Klavans; Evelyne Tzoukermann

This paper describes and discusses some theoretical and practical problems arising from developing a system to combine the structured but incomplete information from machine readable dictionaries (MRDs) with the unstructured but more complete information available in corpora for the creation of a bilingual lexical data base, presenting a methodology to integrate information from both sources into a single lexical data structure. The BICORD system (BIlingual CORpus-enhanced Dictionaries) involves linking entries in Collins English-French and French-English bilingual dictionary with a large English-French and French-English bilingual corpus. We have concentrated on the class of action verbs of movement, building on earlier work on lexical correspondences specific to this verb class between languages (Klavans and Tzoukermann, 1989), (Klavans and Tzoukermann, 1990a), (Klavans and Tzoukermann, 1990b).1 We first examine the way prototypical verbs of movement are translated in the Collins-Robert (Atkins, Duval, and Milne, 1978) bilingual dictionary, and then analyze the behavior of some of these verbs in a large bilingual corpus. We incorporate the results of linguistic research on the theory of verb types to motivate corpus analysis coupled with data from MRDs for the purpose of establishing lexical correspondences with the full range of associated translations, and with statistical data attached to the relevant nodes.

international conference on computational linguistics | 1990

The BICORD system: combining lexical information from bilingual corpora and machine readable dictionaries

Judith L. Klavans; Evelyne Tzoukermann

Our goal is to explore methods for combining structured but incomplete information from dictionaries with the unstructured but more complete information available in corpora for the creation of a bilingual lexical data base. This paper concentrates on the class of action verbs of movement, and builds on earlier work on lexical correspondences between languages and specific to this verb class. The languages we explore here are English and French. We first examine the way prototypical verbs of movement are translated in the Collins-Robert (Collins 1978, henceforth CR) bilingual dictionary. We then analyze the behavior of some of these verbs in a large bilingual corpus. We take advantage of the results of linguistic research on verb types (e.g. Levin, to appear) coupled with data from machine readable dictionaries to motivate corpus-based text analysis for the purpose of establishing lexical correspondences with the full range of associated translations and then attach frequencies to translations.

human language technology | 1992

Progress report on the Chronus system: ATIS benchmark results

Roberto Pieraccini; Evelyne Tzoukermann; Zakhar Gorelov; Esther Levin; Chin-Hui Lee; Jean-Luc Gauvain

The speech understanding system we propose in this paper is based on the stochastic modeling of a sentence as a sequence of elemental units that represent its meaning. According to this paradigm, the original meaning of a sentence, can be decoded using a dynamic programming algorithm, although the small amount of training data currently available suggested the integration of the decoder with a more traditional technique. However, the advantage of this method consists in the development of a framework in which a closed training loop reduces the amount of human supervision in the design phase of the understanding component. The results reported here for the February 1992 DARPA ATIS test are extremely promising, considering the small amount of hand tuning the system required.

international conference on computational linguistics | 1996

Using word class for part-of-speech disambiguation

Evelyne Tzoukermann; Dragomir R. Radev

This paper presents a methodology for improving part-of-speech disambiguation using word classes. We build on earlier work for tagging French where we showed that statistical estimates can be computed without lexical probabilities. We investigate new directions for coming up with different kinds of probabilities based on paradigms of tags for given words. We base estimates not on the words, but on the set of tags associated with a word. We compute frequencies of unigrams, bigrams, and trigrams of word classes in order to further refine the disambiguation. This new approach gives a more efficient representation of the data in order to disambiguate word part-of-speech. We show empirical results to support our claim. We demonstrate that, besides providing good estimates for disambiguation, word classes solve some of the problems caused by sparse training data. We describe a part-of-speech tagger built on these principles and we suggest a methodology for developing an adequate training corpus.

Explore More