Gautham J. Mysore | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gautham J. Mysore is active.

Explore More

Publication

Featured researches published by Gautham J. Mysore.

international conference on computer graphics and interactive techniques | 2014

The visual microphone: passive recovery of sound from video

Abe Davis; Michael Rubinstein; Neal Wadhwa; Gautham J. Mysore; William T. Freeman

When sound hits an object, it causes small vibrations of the objects surface. We show how, using only high-speed video of the object, we can extract those minute vibrations and partially recover the sound that produced them, allowing us to turn everyday objects---a glass of water, a potted plant, a box of tissues, or a bag of chips---into visual microphones. We recover sounds from high-speed footage of a variety of objects with different properties, and use both real and simulated data to examine some of the factors that affect our ability to visually recover sound. We evaluate the quality of recovered sounds using intelligibility and SNR metrics and provide input and recovered audio samples for direct comparison. We also explore how to leverage the rolling shutter in regular consumer cameras to recover audio from standard frame-rate videos, and use the spatial resolution of our method to visualize how sound-related vibrations vary over an objects surface, which we can use to recover the vibration modes of an object.

workshop on applications of signal processing to audio and acoustics | 2009

Separation by “humming”: User-guided sound extraction from monophonic mixtures

Paris Smaragdis; Gautham J. Mysore

In this paper we present a novel approach for isolating and removing sounds from dense monophonic mixtures. The approach is user-based, and requires the presentation of a guide sound that mimics the desired target the user wishes to extract. The guide sound can be simply produced from a user by vocalizing or otherwise replicating the target sound marked for separation. Using that guide as a prior in a statistical sound mixtures model, we propose a methodology that allows us to efficiently extract complex structured sounds from dense mixtures.

IEEE Signal Processing Magazine | 2014

Static and Dynamic Source Separation Using Nonnegative Factorizations: A unified view

Paris Smaragdis; Cédric Févotte; Gautham J. Mysore; Nasser Mohammadiha; Matthew D. Hoffman

Source separation models that make use of nonnegativity in their parameters have been gaining increasing popularity in the last few years, spawning a significant number of publications on the topic. Although these techniques are conceptually similar to other matrix decompositions, they are surprisingly more effective in extracting perceptually meaningful sources from complex mixtures. In this article, we will examine the various methodologies and extensions that make up this family of approaches and present them under a unified framework. We will begin with a short description of the basic concepts and in the subsequent sections we will delve in more details and explore some of the latest extensions.

international conference on acoustics, speech, and signal processing | 2013

Universal speech models for speaker independent single channel source separation

Dennis L. Sun; Gautham J. Mysore

Supervised and semi-supervised source separation algorithms based on non-negative matrix factorization have been shown to be quite effective. However, they require isolated training examples of one or more sources, which is often difficult to obtain. This limits the practical applicability of these algorithms. We examine the problem of efficiently utilizing general training data in the absence of specific training examples. Specifically, we propose a method to learn a universal speech model from a general corpus of speech and show how to use this model to separate speech from other sound sources. This model is used in lieu of a speech model trained on speaker-dependent training examples, and thus circumvents the aforementioned problem. Our experimental results show that our method achieves nearly the same performance as when speaker-dependent training examples are used. Furthermore, we show that our method improves performance when training data of the non-speech source is available.

international conference on acoustics, speech, and signal processing | 2011

A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics

Gautham J. Mysore; Paris Smaragdis

We present a semi-supervised source separation methodology to denoise speech by modeling speech as one source and noise as the other source. We model speech using the recently proposed non-negative hidden Markov model, which uses multiple non-negative dictionaries and a Markov chain to jointly model spectral structure and temporal dynamics of speech. We perform separation of the speech and noise using the recently proposed non-negative factorial hidden Markov model. Although the speech model is learned from training data, the noise model is learned during the separation process and requires no training data. We show that the proposed method achieves superior results to using non-negative spectrogram factorization, which ignores the non-stationarity and temporal dynamics of speech.

international conference on latent variable analysis and signal separation | 2012

Online PLCA for real-time semi-supervised source separation

Zhiyao Duan; Gautham J. Mysore; Paris Smaragdis

Non-negative spectrogram factorization algorithms such as probabilistic latent component analysis (PLCA) have been shown to be quite powerful for source separation. When training data for all of the sources are available, it is trivial to learn their dictionaries beforehand and perform supervised source separation in an online fashion. However, in many real-world scenarios (e.g. speech denoising), training data for one of the sources can be hard to obtain beforehand (e.g. speech). In these cases, we need to perform semi-supervised source separation and learn a dictionary for that source during the separation process. Existing semi-supervised separation approaches are generally offline, i.e. they need to access the entire mixture when updating the dictionary. In this paper, we propose an online approach to adaptively learn this dictionary and separate the mixture over time. This enables us to perform online semi-supervised separation for real-time applications. We demonstrate this approach on real-time speech denoising.

international conference on latent variable analysis and signal separation | 2010

Non-negative hidden Markov modeling of audio with application to source separation

Gautham J. Mysore; Paris Smaragdis; Bhiksha Raj

In recent years, there has been a great deal of work in modeling audio using non-negative matrix factorization and its probabilistic counterparts as they yield rich models that are very useful for source separation and automatic music transcription. Given a sound source, these algorithms learn a dictionary of spectral vectors to best explain it. This dictionary is however learned in a manner that disregards a very important aspect of sound, its temporal structure. We propose a novel algorithm, the non-negative hidden Markov model (N-HMM), that extends the aforementioned models by jointly learning several small spectral dictionaries as well as a Markov chain that describes the structure of changes between these dictionaries. We also extend this algorithm to the non-negative factorial hidden Markov model (N-FHMM) to model sound mixtures, and demonstrate that it yields superior performance in single channel source separation tasks.

user interface software and technology | 2013

Content-based tools for editing audio stories

Steve Rubin; Floraine Berthouzoz; Gautham J. Mysore; Wilmot Li; Maneesh Agrawala

Audio stories are an engaging form of communication that combine speech and music into compelling narratives. Existing audio editing tools force story producers to manipulate speech and music tracks via tedious, low-level waveform editing. In contrast, we present a set of tools that analyze the audio content of the speech and music and thereby allow producers to work at much higher level. Our tools address several challenges in creating audio stories, including (1) navigating and editing speech, (2) selecting appropriate music for the score, and (3) editing the music to complement the speech. Key features include a transcript-based speech editing tool that automatically propagates edits in the transcript text to the corresponding speech track; a music browser that supports searching based on emotion, tempo, key, or timbral similarity to other songs; and music retargeting tools that make it easy to combine sections of music with the speech. We have used our tools to create audio stories from a variety of raw speech sources, including scripted narratives, interviews and political speeches. Informal feedback from first-time users suggests that our tools are easy to learn and greatly facilitate the process of editing raw footage into a final story.

international conference on acoustics, speech, and signal processing | 2012

Clustering and synchronizing multi-camera video via landmark cross-correlation

Nicholas J. Bryan; Paris Smaragdis; Gautham J. Mysore

We propose a method to both identify and synchronize multi-camera video recordings within a large collection of video and/or audio files. Landmark-based audio fingerprinting is used to match multiple recordings of the same event together and time-synchronize each file within the groups. Compared to prior work, we offer improvements towards event identification and a new synchronization refinement method that resolves inconsistent estimates and allows non-overlapping content to be synchronized within larger groups of recordings. Furthermore, the audio fingerprinting-based synchronization is shown to be equivalent to an efficient and scalable time-difference-of-arrival method using cross-correlation performed on a non-linearly transformed signal.

international conference on acoustics, speech, and signal processing | 2009

Relative pitch estimation of multiple instruments

Gautham J. Mysore; Paris Smaragdis

We present an algorithm based on probabilistic latent component analysis and employ it for relative pitch estimation of multiple instruments in polyphonic music. A multilayered positive deconvolution is performed concurrently on mixture constant-Q transforms to obtain a relative pitch track and timbral signature for each instrument. Initial experimental results on mixtures of two instruments are quite promising and show high levels of accuracy.

Explore More