Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Daniel Rudoy is active.

Publication


Featured researches published by Daniel Rudoy.


Journal of the Acoustical Society of America | 2012

Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking.

Daryush D. Mehta; Daniel Rudoy; Patrick J. Wolfe

Vocal tract resonance characteristics in acoustic speech signals are classically tracked using frame-by-frame point estimates of formant frequencies followed by candidate selection and smoothing using dynamic programming methods that minimize ad hoc cost functions. The goal of the current work is to provide both point estimates and associated uncertainties of center frequencies and bandwidths in a statistically principled state-space framework. Extended Kalman (K) algorithms take advantage of a linearized mapping to infer formant and antiformant parameters from frame-based estimates of autoregressive moving average (ARMA) cepstral coefficients. Error analysis of KARMA, wavesurfer, and praat is accomplished in the all-pole case using a manually marked formant database and synthesized speech waveforms. KARMA formant tracks exhibit lower overall root-mean-square error relative to the two benchmark algorithms with the ability to modify parameters in a controlled manner to trade off bias and variance. Antiformant tracking performance of KARMA is illustrated using synthesized and spoken nasal phonemes. The simultaneous tracking of uncertainty levels enables practitioners to recognize time-varying confidence in parameters of interest and adjust algorithmic settings accordingly.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Time-Varying Autoregressions in Speech: Detection Theory and Applications

Daniel Rudoy; Thomas F. Quatieri; Patrick J. Wolfe

This paper develops a general detection theory for speech analysis based on time-varying autoregressive models, which themselves generalize the classical linear predictive speech analysis framework. This theory leads to a computationally efficient decision-theoretic procedure that may be applied to detect the presence of vocal tract variation in speech waveform data. A corresponding generalized likelihood ratio test is derived and studied both empirically for short data records, using formant-like synthetic examples, and asymptotically, leading to constant false alarm rate hypothesis tests for changes in vocal tract configuration. Two in-depth case studies then serve to illustrate the practical efficacy of this procedure across different time scales of speech dynamics: first, the detection of formant changes on the scale of tens of milliseconds of data, and second, the identification of glottal opening and closing instants on time scales below ten milliseconds.


IEEE Transactions on Signal Processing | 2010

Superposition Frames for Adaptive Time-Frequency Analysis and Fast Reconstruction

Daniel Rudoy; Prabahan Basu; Patrick J. Wolfe

In this paper, we introduce a broad family of adaptive, linear time-frequency representations termed superposition frames, and show that they admit desirable fast overlap-add reconstruction properties akin to standard short-time Fourier techniques. This approach stands in contrast to many adaptive time-frequency representations in the existing literature, which, while more flexible than standard fixed-resolution approaches, typically fail to provide for efficient reconstruction and often lack the regular structure necessary for precise frame-theoretic analysis. Our main technical contributions come through the development of properties which ensure that our superposition construction provides for a numerically stable, invertible signal representation. Our primary algorithmic contributions come via the introduction and discussion of two signal adaptation schemes based on greedy selection and dynamic programming, respectively. We conclude with two short enhancement examples that serve to highlight potential applications of our approach.


international conference on acoustics, speech, and signal processing | 2008

Adaptive short-time analysis-synthesis for speech enhancement

Daniel Rudoy; Prabahan Basu; Thomas F. Quatieri; Bob Dunn; Patrick J. Wolfe

In this paper we present a new adaptive short-time Fourier analysis-synthesis scheme and demonstrate its efficacy in speech enhancement. While a number of adaptive analyses have previously been proposed to overcome the limitations of fixed-resolution schemes, we propose here a modified overlap-add procedure that enables efficient resynthesis. Our adaptation scheme extends earlier work using local measures of time-frequency concentration, and is applicable to power spectral density estimation for the case of noisy speech. We provide evidence of increased gains in signal-to-noise ratios for synthetic signals as well as empirical evidence of reduced musical noise based on expert listening tests for voiced and phonetically balanced utterances observed in noise, relative to a standard baseline speech enhancement system whose time-frequency resolution is fixed.


Journal of The Royal Statistical Society Series C-applied Statistics | 2010

Bayesian change‐point analysis for atomic force microscopy and soft material indentation

Daniel Rudoy; Shelten G. Yuen; Robert D. Howe; Patrick J. Wolfe

Material indentation studies, in which a probe is brought into controlled physical contact with an experimental sample, have long been a primary means by which scientists characterize the mechanical properties of materials. More recently, the advent of atomic force microscopy, which operates on the same fundamental principle, has in turn revolutionized the nanoscale analysis of soft biomaterials such as cells and tissues. The paper addresses the inferential problems that are associated with material indentation and atomic force microscopy, through a framework for the change-point analysis of pre-contact and post-contact data that is applicable to experiments across a variety of physical scales. A hierarchical Bayesian model is proposed to account for experimentally observed change-point smoothness constraints and measurement error variability, with efficient Monte Carlo methods developed and employed to realize inference via posterior sampling for parameters such as Youngs modulus, which is a key quantifier of material stiffness. These results are the first to provide the materials science community with rigorous inference procedures and quantification of uncertainty, via optimized and fully automated high throughput algorithms, implemented as the publicly available software package BayesCP. To demonstrate the consistent accuracy and wide applicability of this approach, results are shown for a variety of data sets from both macromaterials and micromaterials experiments-including silicone, neurons and red blood cells-conducted by the authors and others. Copyright (c) 2010 Royal Statistical Society.


international conference on acoustics, speech, and signal processing | 2009

A nonparametric test for stationarity based on local Fourier analysis

Prabahan Basu; Daniel Rudoy; Patrick J. Wolfe

In this paper we propose a nonparametric hypothesis test for stationarity based on local Fourier analysis. We employ a test statistic that measures the variation of time-localized estimates of the power spectral density of an observed random process. For the case of a white Gaussian noise process, we characterize the asymptotic distribution of this statistic under the null hypothesis of stationarity, and use it to directly set test thresholds corresponding to constant false alarm rates. For other cases, we introduce a simple procedure to simulate from the null distribution of interest. After validating the procedure on synthetic examples, we demonstrate one potential use for the test as a method of obtaining a signal-adaptive means of local Fourier analysis and corresponding signal enhancement scheme.


international conference on acoustics, speech, and signal processing | 2010

Autoregressive modeling of voiced speech

Maria A. Berezina; Daniel Rudoy; Patrick J. Wolfe

It is well known that the classical linear predictive model for speech fails to take into account the quasi-periodic nature of the glottal flow typical of voiced speech. In this article we describe how to incorporate an estimate of the glottal flow directly into the traditional linear prediction framework, through the use of flexible basis function expansions that admit efficient estimation procedures. As we show, this not only allows for improved estimation of vocal tract transfer function parameters in a manner that is robust to pitch variation, but also precludes the need for nonlinear optimization procedures typically required in glottal waveform estimation. We illustrate our approach with experiments using synthesized and real speech waveforms, and show how it may be used to directly estimate the relative degree of voicing and aspiration present in a given utterance.


international conference on acoustics, speech, and signal processing | 2007

Multi-Scale MCMC Methods for Sampling from Products of Gaussian Mixtures

Daniel Rudoy; Patrick J. Wolfe

This paper addresses the important and ubiquitous problem of sampling from a product of Gaussian mixtures. An exact solution is often computationally infeasible, thus motivating the development of efficient sampling schemes. However, naive Markov chain Monte Carlo algorithms perform poorly in cases where the product mixture is highly multi-modal. In this paper we follow the trend of recent work utilizing multi-scale sampling methods, and propose two new multi-scale Markov chain Monte Carlo algorithms based on simulated and parallel tempering. Empirical results indicate that for the same computational budget, this class of methods can improve performance in cases with widely separated modes.


asilomar conference on signals, systems and computers | 2006

Monte Carlo Methods for Multi-Modal Distributions

Daniel Rudoy; Patrick J. Wolfe

This paper explores auxiliary variable strategies for designing Monte Carlo algorithms to sample from multi-modal distributions. Naive importance sampling and Markov chain Monte Carlo methods perform poorly in such situations, motivating the development of alternative methods-in particular, those based on a multi-scale representation of the target distribution. Here we present a novel multi-scale algorithm for sampling from products of Gaussian mixtures, a canonical example in which multi-modality arises frequently in practice. This algorithm is based on a fusion of importance sampling and Markov chain Monte Carlo steps through the recently proposed framework of sequential Monte Carlo samplers. Simulation results indicate that in comparison to either form of sampling technique alone, the resulting algorithm performs more robustly in multi-modal cases than those previously reported in the literature.


international conference on acoustics, speech, and signal processing | 2011

Joint source-filter modeling using flexible basis functions

Daryush D. Mehta; Daniel Rudoy; Patrick J. Wolfe

Improving on recent work on joint source-filter analysis of speech waveforms, we explore improvements to an autoregressive model with exogenous inputs represented by flexible basis functions. Following a brief review of the maximum likelihood estimators of the model parameters, the Cramér-Rao bounds are derived to provide evidence for the challenging nature of estimating source and filter characteristics with overlapping spectra. Wavelet expansion of the exogenous inputs is employed, and the selection of an appropriate subset of wavelets is described as an online, signal-adaptive approach. Results from synthesized and real vowel analysis illustrate the promise of iterative wavelet shrinkage using soft and hard thresholding and an alternative regularization method.

Collaboration


Dive into the Daniel Rudoy's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Thomas F. Quatieri

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bob Dunn

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Daniel N. Spendley

Air Force Research Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Maria A. Berezina

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge