Jonathan Mamou
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jonathan Mamou.
international acm sigir conference on research and development in information retrieval | 2007
Jonathan Mamou; Bhuvana Ramabhadran; Olivier Siohan
We are interested in retrieving information from speech data like broadcast news, telephone conversations and roundtable meetings. Today, most systems use large vocabulary continuous speech recognition tools to produce word transcripts; the transcripts are indexed and query terms are retrieved from the index. However, query terms that are not part of the recognizers vocabulary cannot be retrieved, and the recall of the search is affected. In addition to the output word transcript, advanced systems provide also phonetic transcripts, against which query terms can be matched phonetically. Such phonetic transcripts suffer from lower accuracy and cannot be an alternative to word transcripts.We present a vocabulary independent system that can handle arbitrary queries, exploiting the information provided by having both word transcripts and phonetic transcripts. A speech recognizer generates word confusion networks and phonetic lattices. The transcripts are indexed for query processing and ranking purpose.The value of the proposed method is demonstrated by the relative high performance ofour system, which received the highest overall ranking for US English speech data in the recent NIST Spoken Term Detection evaluation.
international acm sigir conference on research and development in information retrieval | 2006
Jonathan Mamou; David Carmel; Ron Hoory
We are interested in retrieving information from conversational speech corpora, such as call-center data. This data comprises spontaneous speech conversations with low recording quality, which makes automatic speech recognition (ASR) a highly difficult task. For typical call-center data, even state-of-the-art large vocabulary continuous speech recognition systems produce a transcript with word error rate of 30% or higher. In addition to the output transcript, advanced systems provide word confusion networks (WCNs), a compact representation of word lattices associating each word hypothesis with its posterior probability. Our work exploits the information provided by WCNs in order to improve retrieval performance. In this paper, we show that the mean average precision (MAP) is improved using WCNs compared to the raw word transcripts. Finally, we analyze the effect of increasing ASR word error rate on search effectiveness. We show that MAP is still reasonable even under extremely high error rate.
international conference on acoustics, speech, and signal processing | 2013
Jonathan Mamou; Jia Cui; Xiaodong Cui; Mark J. F. Gales; Brian Kingsbury; Kate Knill; Lidia Mangu; David Nolden; Michael Picheny; Bhuvana Ramabhadran; Ralf Schlüter; Abhinav Sethy; Philip C. Woodland
Spoken content in languages of emerging importance needs to be searchable to provide access to the underlying information. In this paper, we investigate the problem of extending data fusion methodologies from Information Retrieval for Spoken Term Detection on low-resource languages in the framework of the IARPA Babel program. We describe a number of alternative methods improving keyword search performance. We apply these methods to Cantonese, a language that presents some new issues in terms of reduced resources and shorter query lengths. First, we show score normalization methodology that improves in average by 20% keyword search performance. Second, we show that properly combining the outputs of diverse ASR systems performs 14% better than the best normalized ASR system.
international conference on acoustics, speech, and signal processing | 2013
Brian Kingsbury; Jia Cui; Xiaodong Cui; Mark J. F. Gales; Kate Knill; Jonathan Mamou; Lidia Mangu; David Nolden; Michael Picheny; Bhuvana Ramabhadran; Ralf Schlüter; Abhinav Sethy; Philip C. Woodland
We present a system for keyword search on Cantonese conversational telephony audio, collected for the IARPA Babel program, that achieves good performance by combining postings lists produced by diverse speech recognition systems from three different research groups. We describe the keyword search task, the data on which the work was done, four different speech recognition systems, and our approach to system combination for keyword search. We show that the combination of four systems outperforms the best single system by 7%, achieving an actual term-weighted value of 0.517.
international conference on acoustics, speech, and signal processing | 2013
Jia Cui; Xiaodong Cui; Bhuvana Ramabhadran; Janice Kim; Brian Kingsbury; Jonathan Mamou; Lidia Mangu; Michael Picheny; Tara N. Sainath; Abhinav Sethy
Automatic speech recognition is a core component of many applications, including keyword search. In this paper we describe experiments on acoustic modeling, language modeling, and decoding for keyword search on a Cantonese conversational telephony corpus collected as part of the IARPA Babel program. We show that acoustic modeling techniques such as the bootstrapped-and-restructured model and deep neural network acoustic model significantly outperform a state-of-the-art baseline GMM/HMM model, in terms of both recognition performance and keyword search performance, with improvements of up to 11% relative character error rate reduction and 31% relative maximum term weighted value improvement. We show that while an interpolated Model M and neural network LM improve recognition performance, they do not improve keyword search results; however, the advanced LM does reduce the size of the keyword search index. Finally, we show that a simple form of automatically adapted keyword search performs 16% better than a preindexed search system, indicating that out-of-vocabulary search is still a challenge.
Proceedings of the 1st international workshop on Multimodal crowd sensing | 2012
Haggai Roitman; Jonathan Mamou; Sameep Mehta; Aharon Satt; L. V. Subramaniam
In this work we discuss the challenge of harnessing the crowd for smart city sensing. Within a citys context, such reports by citizen or city visitor eye witnesses may provide important information to city officials, additionally to more traditional data gathered by other means (e.g., through the citys control center, emergency services, sensors spread across the city, etc). We present an high-level overview of a novel crowd sensing system that we develop in IBM for the smart cities domain. As a proof of concept, we present some preliminary results using public safety as our example usecase.
ieee automatic speech recognition and understanding workshop | 2013
Murat Saraclar; Abhinav Sethy; Bhuvana Ramabhadran; Lidia Mangu; Jia Cui; Xiaodong Cui; Brian Kingsbury; Jonathan Mamou
Keyword search, in the context of low resource languages, has emerged as a key area of research. The dominant approach in keyword search is to use Automatic Speech Recognition (ASR) as a front end to produce a representation of audio that can be indexed. The biggest drawback of this approach lies in its the inability to deal with out-of-vocabulary words and query terms that are not in the ASR system output. In this paper we present an empirical study evaluating various approaches based on using confusion models as query expansion techniques to address this problem. We present results across four languages using a range of confusion models which lead to significant improvements in keyword search performance as measured by the Maximum Term Weighted Value (MTWV) metric.
international conference on acoustics, speech, and signal processing | 2014
Jia Cui; Jonathan Mamou; Brian Kingsbury; Bhuvana Ramabhadran
In this paper, we investigate the problem of automatically selecting textual keywords for keyword search development and tuning on audio data for any language. Briefly, the method samples candidate keywords in the training data while trying to match a set of target marginal distributions for keyword features such as keyword frequency in the training or development audio, keyword length, frequency of out-of-vocabulary words, and TF-IDF scores. The method is evaluated on four IARPA Babel program base period languages. We show the use of the automatically selected keywords for the keyword search system development and tuning. We show also that search performance is improved by tuning the decision threshold on the automatically selected keywords.
international acm sigir conference on research and development in information retrieval | 2009
Jonathan Mamou; Yosi Mass; Michal Shmueli-Scheuer; Benjamin Sznajder
We present an efficient method for approximate search in a combination of several metric spaces -- which are a generalization of low level image features -- using an inverted index. Our approximation gives very high recall with subsecond response time on a real data set of one million images extracted from Flickr. We further exploit the inverted index to improve efficiency of the query processing by combining our search in metric features with search in associated textual metadata.
international conference on acoustics, speech, and signal processing | 2011
Alexander Sorin; Hagai Aronowitz; Jonathan Mamou; Orith Toledo-Ronen; Ron Hoory; Michael Kuritzky; Yael Erez; Bhuvana Ramabhadran; Abhinav Sethy
The paper presents a new application of automatic speech processing in the Ambient Assisted Living area, developed in the course of a three year research project. Recording and automatic processing of spoken conversations plays a major role in this solution enabling effective search in a personal audio archive and fast browsing of conversations. Processing of elderly conversational speech recorded by a distant PDA microphone poses a great challenge. The speech processing flow includes transcription, speaker tracking and combined indexing and search of spoken terms and participating speakers identity extracted from the audio. We present the entire application and individual speech processing components as well as evaluation results of the individual components and of the end-to-end spoken information retrieval solution.