Anatole Gershman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anatole Gershman is active.

Explore More

Publication

Featured researches published by Anatole Gershman.

meeting of the association for computational linguistics | 2014

Metaphor Detection with Cross-Lingual Model Transfer

Yulia Tsvetkov; Leonid Boytsov; Anatole Gershman; Eric Nyberg; Chris Dyer

We show that it is possible to reliably discriminate whether a syntactic construction is meant literally or metaphorically using lexical semantic features of the words that participate in the construction. Our model is constructed using English resources, and we obtain state-of-the-art performance relative to previous work in this language. Using a model transfer approach by pivoting through a bilingual dictionary, we show our model can identify metaphoric expressions in other languages. We provide results on three new test sets in Spanish, Farsi, and Russian. The results support the hypothesis that metaphors are conceptual, rather than lexical, in nature.

international joint conference on natural language processing | 2015

Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding

Yun-Nung Chen; William Yang Wang; Anatole Gershman; Alexander I. Rudnicky

Spoken dialogue systems (SDS) typically require a predefined semantic ontology to train a spoken language understanding (SLU) module. In addition to the annotation cost, a key challenge for designing such an ontology is to define a coherent slot set while considering their complex relations. This paper introduces a novel matrix factorization (MF) approach to learn latent feature vectors for utterances and semantic elements without the need of corpus annotations. Specifically, our model learns the semantic slots for a domain-specific SDS in an unsupervised fashion, and carries out semantic parsing using latent MF techniques. To further consider the global semantic structure, such as inter-word and inter-slot relations, we augment the latent MF-based model with a knowledge graph propagation model based on a slot-based semantic graph and a word-based lexical graph. Our experiments show that the proposed MF approaches produce better SLU models that are able to predict semantic slots and word patterns taking into account their relations and domain-specificity in a joint manner.

Knowledge Based Systems | 2016

Exploring events and distributed representations of text in multi-document summarization

Luís Marujo; Wang Ling; Ricardo Ribeiro; Anatole Gershman; Jaime G. Carbonell; David Martins de Matos; João Paulo Neto

We explore an event detection framework to improve multi-document summarizationWe use distributed representations of text to address different lexical realizationsSummarization is based on the hierarchical combination of single-document summariesWe performed an automatic evaluation and a human study of the generated summariesQuantitative and qualitative results show clear improvements over the state-of-the-art In this article, we explore an event detection framework to improve multi-document summarization. Our approach is based on a two-stage single-document method that extracts a collection of key phrases, which are then used in a centrality-as-relevance passage retrieval model. We explore how to adapt this single-document method for multi-document summarization methods that are able to use event information. The event detection method is based on Fuzzy Fingerprint, which is a supervised method trained on documents with annotated event tags. To cope with the possible usage of different terms to describe the same event, we explore distributed representations of text in the form of word embeddings, which contributed to improve the summarization results. The proposed summarization methods are based on the hierarchical combination of single-document summaries. The automatic evaluation and human study performed show that these methods improve upon current state-of-the-art multi-document summarization systems on two mainstream evaluation datasets, DUC 2007 and TAC 2009. We show a relative improvement in ROUGE-1 scores of 16% for TAC 2009 and of 17% for DUC 2007.

international acm sigir conference on research and development in information retrieval | 2013

Self reinforcement for important passage retrieval

Ricardo Ribeiro; Luís Marujo; David Martins de Matos; João Paulo Neto; Anatole Gershman; Jaime G. Carbonell

In general, centrality-based retrieval models treat all elements of the retrieval space equally, which may reduce their effectiveness. In the specific context of extractive summarization (or important passage retrieval), this means that these models do not take into account that information sources often contain lateral issues, which are hardly as important as the description of the main topic, or are composed by mixtures of topics. We present a new two-stage method that starts by extracting a collection of key phrases that will be used to help centrality-as-relevance retrieval model. We explore several approaches to the integration of the key phrases in the centrality model. The proposed method is evaluated using different datasets that vary in noise (noisy vs clean) and language (Portuguese vs English). Results show that the best variant achieves relative performance improvements of about 31% in clean data and 18% in noisy data.

text speech and dialogue | 2012

Key Phrase Extraction of Lightly Filtered Broadcast News

Luís Marujo; Ricardo Ribeiro; David Martins de Matos; João Paulo Neto; Anatole Gershman; Jaime G. Carbonell

This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applied to Broadcast News (BN). Key phrases are words and expressions that best characterize the content of a document. Key phrases are often used to index the document or as features in further processing. This makes improvements in AKE accuracy particularly important. We hypothesized that filtering out marginally relevant sentences from a document would improve AKE accuracy. Our experiments confirmed this hypothesis. Elimination of as little as 10% of the document sentences lead to a 2% improvement in AKE precision and recall. AKE is built over MAUI toolkit that follows a supervised learning approach. We trained and tested our AKE method on a gold standard made of 8 BN programs containing 110 manually annotated news stories. The experiments were conducted within a Multimedia Monitoring Solution (MMS) system for TV and radio news/programs, running daily, and monitoring 12 TV and 4 radio channels.

joint conference on lexical and computational semantics | 2015

Extending a Single-Document Summarizer to Multi-Document: a Hierarchical Approach

Luís Marujo; Ricardo Ribeiro; David Martins de Matos; João Paulo Neto; Anatole Gershman; Jaime G. Carbonell

The increasing amount of online content motivated the development of multi-document summarization methods. In this work, we explore straightforward approaches to extend single-document summarization methods to multi-document summarization. The proposed methods are based on the hierarchical combination of single-document summaries, and achieves state of the art results.

international joint conference on natural language processing | 2015

Automatic Keyword Extraction on Twitter

Luís Marujo; Wang Ling; Isabel Trancoso; Chris Dyer; Alan W. Black; Anatole Gershman; David Martins de Matos; João Paulo Neto; Jaime G. Carbonell

In this paper, we build a corpus of tweets from Twitter annotated with keywords using crowdsourcing methods. We identify key differences between this domain and the work performed on other domains, such as news, which makes existing approaches for automatic keyword extraction not generalize well on Twitter datasets. These datasets include the small amount of content in each tweet, the frequent usage of lexical variants and the high variance of the cardinality of keywords present in each tweet. We propose methods for addressing these issues, which leads to solid improvements on this dataset for this task.

IEEE Conf. on Intelligent Systems (1) | 2015

Textual Event Detection Using Fuzzy Fingerprints

Luís Marujo; João Paulo Carvalho; Anatole Gershman; Jaime G. Carbonell; João Paulo Neto; David Martins de Matos

In this paper we present a method to improve the automatic detection of events in short sentences when in the presence of a large number of event classes. Contrary to standard classification techniques such as Support Vector Machines or Random Forest, the proposed Fuzzy Fingerprints method is able to detect all the event classes present in the ACE 2005 Multilingual Corpus, and largely improves the obtained G-Mean value.

systems, man and cybernetics | 2009

Analysis of uncertain data: Evaluation of given hypotheses

Anatole Gershman; Eugene Fink; Bin Fu; Jaime G. Carbonell

We consider the problem of heuristic evaluation of given hypotheses based on limited observations, in situations when available data are insufficient for rigorous statistical analysis.

systems, man and cybernetics | 2009

Analysis of uncertain data: Selection of probes for information gathering

Anatole Gershman; Eugene Fink; Bin Fu; Jaime G. Carbonell

We consider the problem of gathering data for evaluation of given hypotheses, and describe a method for analyzing tradeoffs between the expected utility and the cost of data collection.

Explore More