Eugene Weinstein | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eugene Weinstein is active.

Explore More

Publication

Featured researches published by Eugene Weinstein.

international conference on multimodal interfaces | 2003

Towards robust person recognition on handheld devices using face and speaker identification technologies

Timothy J. Hazen; Eugene Weinstein; Alex Park

Most face and speaker identification techniques are tested on data collected in controlled environments using high quality cameras and microphones. However, the use of these technologies in variable environments and with the help of the inexpensive sound and image capture hardware present in mobile devices presents an additional challenge. In this study, we investigate the application of existing face and speaker identification techniques to a person identification task on a handheld device. These techniques have proven to perform accurately on tightly constrained experiments where the lighting conditions, visual backgrounds, and audio environments are fixed and specifically adjusted for optimal data quality. When these techniques are applied on mobile devices where the visual and audio conditions are highly variable, degradations in performance can be expected. Under these circumstances, the combination of multiple biometric modalities can improve the robustness and accuracy of the person identification task. In this paper, we present our approach for combining face and speaker identification technologies and experimentally demonstrate a fused multi-biometric system which achieves a 50% reduction in equal error rate over the better of the two independent systems.

CADUI | 2005

A Framework for Developing Conversational User Interfaces

James R. Glass; Eugene Weinstein; Scott Cyphers; Joseph Polifroni; Grace Chung; Mikio Nakano

In this work we report our efforts to facilitate the creation of mixed-initiative conversational interfaces for novice and experienced developers of human language technology. Our focus has been on a framework that allows developers to easily specify the basic concepts of their applications, and rapidly prototype conversational interfaces for a variety of configurations. In this paper we describe the current knowledge representation, the compilation processes for speech understanding, generation, and dialogue turn management, as well as the user interfaces created for novice users and more experienced developers. Finally, we report our experiences with several user groups in which developers used this framework to prototype a variety of conversational interfaces.

international conference on acoustics, speech, and signal processing | 2007

Music Identification with Weighted Finite-State Transducers

Eugene Weinstein; Pedro J. Moreno

Music identification is the process of matching an audio stream to a particular song. Previous work has relied on hashing, where an exact or almost-exact match between local features of the test and reference recordings is required. In this work we present a new approach to music identification based on finite-state transducers and Gaussian mixture models. We apply an unsupervised training process to learn an inventory of music phone units similar to phonemes in speech. We also learn a unique sequence of music units characterizing each song. We further propose a novel application of transducers for recognition of music phone sequences. Preliminary experiments demonstrate an identification accuracy of 99.5% on a database of over 15,000 songs running faster than real time.

international conference on implementation and application of automata | 2007

Factor automata of automata and applications

Mehryar Mohri; Pedro J. Moreno; Eugene Weinstein

An efficient data structure for representing the full index of a set of strings is the factor automaton, the minimal deterministic automaton representing the set of all factors or substrings of these strings. This paper presents a novel analysis of the size of the factor automaton of an automaton, that is the minimal deterministic automaton accepting the set of factors of a finite set of strings, itself represented by a finite automaton. It shows that the factor automaton of a set of strings U has at most 2|Q| - 2 states, where Q is the number of nodes of a prefix-tree representing the strings in U, a bound that significantly improves over 2||U|| - 1, the bound given by Blumer et al. (1987), where ||U|| is the sum of the lengths of all strings in U. It also gives novel and general bounds for the size of the factor automaton of an automaton as a function of the size of the original automaton and the maximal length of a suffix shared by the strings it accepts. Our analysis suggests that the use of factor automata of automata can be practical for large-scale applications, a fact that is further supported by the results of our experiments applying factor automata to a music identification task with more than 15,000 songs.

IEEE Transactions on Audio, Speech, and Language Processing | 2010

Efficient and Robust Music Identification With Weighted Finite-State Transducers

Mehryar Mohri; Pedro J. Moreno; Eugene Weinstein

We present an approach to music identification based on weighted finite-state transducers and Gaussian mixture models, inspired by techniques used in large-vocabulary speech recognition. Our modeling approach is based on learning a set of elementary music sounds in a fully unsupervised manner. While the space of possible music sound sequences is very large, our method enables the construction of a compact and efficient representation for the song collection using finite-state transducers. This paper gives a novel and substantially faster algorithm for the construction of factor transducers, the key representation of song snippets supporting our music identification technique. The complexity of our algorithm is linear with respect to the size of the suffix automaton constructed. Our experiments further show that it helps speed up the construction of the weighted suffix automaton in our task by a factor of 17 with respect to our previous method using the intermediate steps of determinization and minimization. We show that, using these techniques, a large-scale music identification system can be constructed for a database of over 15 000 songs while achieving an identification accuracy of 99.4% on undistorted test data, and performing robustly in the presence of noise and distortions.

international conference on acoustics, speech, and signal processing | 2012

Mobile music modeling, analysis and recognition

Pavel Golik; Boulos Harb; Ananya Misra; Michael Riley; Alex Rudnick; Eugene Weinstein

We present an analysis of music modeling and recognition techniques in the context of mobile music matching, substantially improving on the techniques presented in [1]. We accomplish this by adapting the features specifically to this task, and by introducing new modeling techniques that enable using a corpus of noisy and channel-distorted data to improve mobile music recognition quality. We report the results of an extensive empirical investigation of the systems robustness under realistic channel effects and distortions. We show an improvement of recognition accuracy by explicit duration modeling of music phonemes and by integrating the expected noise environment into the training process. Finally, we propose the use of frame-to-phoneme alignment for high-level structure analysis of polyphonic music.

spoken language technology workshop | 2016

High quality agreement-based semi-supervised training data for acoustic modeling

Félix de Chaumont Quitry; Asa Oines; Pedro J. Moreno; Eugene Weinstein

This paper describes a new technique to automatically obtain large high-quality training speech corpora for acoustic modeling. Traditional approaches select utterances based on confidence thresholds and other heuristics. We propose instead to use an ensemble approach: we transcribe each utterance using several recognizers, and only keep those on which they agree. The recognizers we use are trained on data from different dialects of the same language, and this diversity leads them to make different mistakes in transcribing speech utterances. In this work we show, however, that when they agree, this is an extremely strong signal that the transcript is correct. This allows us to produce automatically transcribed speech corpora that are superior in transcript correctness even to those manually transcribed by humans. Furthermore, we show that using the produced semi-supervised data sets, we can train new acoustic models which outperform those trained solely on previously available data sets.

conference of the international speech communication association | 2001