Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nicolas Malyska is active.

Publication


Featured researches published by Nicolas Malyska.


EURASIP Journal on Advances in Signal Processing | 2011

PHONOLOGICALLY-BASED BIOMARKERS FOR MAJOR DEPRESSIVE DISORDER

Andrea Carolina Trevino; Thomas F. Quatieri; Nicolas Malyska

Of increasing importance in the civilian and military population is the recognition of major depressive disorder at its earliest stages and intervention before the onset of severe symptoms. Toward the goal of more effective monitoring of depression severity, we introduce vocal biomarkers that are derived automatically from phonologically-based measures of speech rate. To assess our measures, we use a 35-speaker free-response speech database of subjects treated for depression over a 6-week duration. We find that dissecting average measures of speech rate into phone-specific characteristics and, in particular, combined phone-duration measures uncovers stronger relationships between speech rate and depression severity than global measures previously reported for a speech-rate biomarker. Results of this study are supported by correlation of our measures with depression severity and classification of depression state with these vocal measures. Our approach provides a general framework for analyzing individual symptom categories through phonological units, and supports the premise that speaking rate can be an indicator of psychomotor retardation severity.


international conference on acoustics, speech, and signal processing | 2005

Automatic dysphonia recognition using biologically-inspired amplitude-modulation features

Nicolas Malyska; Thomas F. Quatieri; Douglas E. Sturim

A dysphonia, or disorder of the mechanisms of phonation in the larynx, can create time-varying amplitude fluctuations in the voice. A model for band-dependent analysis of this amplitude modulation (AM) phenomenon in dysphonic speech is developed from a traditional communications engineering perspective. This perspective challenges current dysphonia analysis methods that analyze AM in the time-domain signal. An automatic dysphonia recognition system is designed to exploit AM in voice using a biologically inspired model of the inferior colliculus. This system, built upon a Gaussian-mixture-model (GMM) classification backend, recognizes the presence of dysphonia in the voice signal. Recognition experiments using data obtained from the Kay elemetrics voice disorders database suggest that the system provides complementary information to state-of-the-art mel-cepstral features. We present dysphonia recognition as an approach to developing features that capture glottal source differences in normal speech.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Spectral Representations of Nonmodal Phonation

Nicolas Malyska; Thomas F. Quatieri

Regions of nonmodal phonation, which exhibit deviations from uniform glottal-pulse periods and amplitudes, occur often in speech and convey information about linguistic content, speaker identity, and vocal health. Some aspects of these deviations are random, including small perturbations, known as jitter and shimmer, as well as more significant aperiodicities. Other aspects are deterministic, including repeating patterns of fluctuations such as diplophonia and triplophonia. These deviations are often the source of misinterpretation of the spectrum. In this paper, we introduce a general signal-processing framework for interpreting the effects of both stochastic and deterministic aspects of nonmodality on the short-time spectrum. As an example, we show that the spectrum is sensitive to even small perturbations in the timing and amplitudes of glottal pulses. In addition, we illustrate important characteristics that can arise in the spectrum, including apparent shifting of the harmonics and the appearance of multiple pitches. For stochastic perturbations, we arrive at a formulation of the power-spectral density as the sum of a low-pass line spectrum and a high-pass noise floor. Our findings are relevant to a number of speech-processing areas including linear-prediction analysis, sinusoidal analysis-synthesis, spectrally derived features, and the analysis of disordered voices.


workshop on applications of signal processing to audio and acoustics | 2009

Sinewave parameter estimation using the fast Fan-Chirp Transform

Robert B. Dunn; Thomas F. Quatieri; Nicolas Malyska

Sinewave analysis/synthesis has long been an important tool for audio analysis, modification and synthesis [1]. The recently introduced Fan-Chirp Transform (FChT) [2,3] has been shown to improve the fidelity of sinewave parameter estimates for a harmonic audio signal with rapid frequency modulation [4]. A fast version of the FChT [3] reduces computation but this algorithm presents two factors that affect sinewave parameter estimation. The phase of the fast FChT does not match the phase of the original continuous-time transform and this interferes with the estimation of sinewave phases. Also, the fast FChT requires an interpolation of the input signal and the choice of interpolator affects the speed of the transform and accuracy of the estimated sinewave parameters. In this paper we demonstrate how to modify the phase of the fast FChT such that it can be used to estimate sinewave phases, and we explore the use of various interpolators demonstrating the tradeoff between transform speed and sinewave parameter accuracy.


workshop on applications of signal processing to audio and acoustics | 2003

Auditory signal processing as a basis for speaker recognition

Thomas F. Quatieri; Nicolas Malyska; Douglas E. Sturim

We exploit models of auditory signal processing at different levels along the auditory pathway for use in speaker recognition. A low-level nonlinear model, at the cochlea, provides accentuated signal dynamics, while a high-level model, at the inferior colliculus, provides frequency analysis of modulation components that reveals an additional temporal structure. A variety of features are derived from the low-level dynamic and high-level modulation signals. Fusion of likelihood scores from feature sets at different auditory levels with scores from standard Mel-cepstral features provides an encouraging speaker recognition performance gain over use of the Mel-cepstrum alone with corpora from land-line and cellular telephone communications.


international conference on acoustics, speech, and signal processing | 2011

A time-warping framework for speech turbulence-noise component estimation during aperiodic phonation

Nicolas Malyska; Thomas F. Quatieri

The accurate estimation of turbulence noise affects many areas of speech processing including separate modification of the noise component, analysis of degree of speech aspiration for treating pathological voice, the automatic labeling of speech voicing, as well as speaker characterization and recognition. Previous work in the literature has provided methods by which such a high-quality noise component may be estimated in near-periodic speech, but it is known that these methods tend to leak aperiodic phonation (with even slight deviations from periodicity) into the noise-component estimate. In this paper, we improve upon existing algorithms in conditions of aperiodicity by introducing a time-warping based approach to speech noise-component estimation, demonstrating the results on both natural and synthetic speech examples.


ieee automatic speech recognition and understanding workshop | 2015

Analysis of factors affecting system performance in the ASpIRE challenge

Jennifer T Melot; Nicolas Malyska; Jessica Ray; Wade Shen

This paper presents an analysis of factors affecting system performance in the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge. In particular, overall word error rate (WER) of the solver systems is analyzed as a function of room, distance between talker and microphone, and microphone type. We also analyze speech activity detection performance of the solver systems and investigate its relationship to WER. The primary goal of the paper is to provide insight into the factors affecting system performance in the ASpIRE evaluation set across many systems given annotations and metadata that are not available to the solvers. This analysis will inform the design of future challenges and provide insight into the efficacy of current solutions addressing noisy reverberant speech in mismatched conditions.


international conference on acoustics, speech, and signal processing | 2010

Preserving the character of perturbations in scaled pitch contours

Thomas A. Baran; Nicolas Malyska; Thomas F. Quatieri

The global and fine dynamic components of a pitch contour in voice production, as in the speaking and singing voice, are important for both the meaning and character of an utterance. In speech, for example, slow pitch inflections, rapid pitch accents, and irregular regions all comprise the pitch contour. In applications where all components of a pitch contour are stretched or compressed in the same way, as for example in time-scale modification, an unnatural scaled contour may result. In this paper, we develop a framework for scaling pitch contours, motivated by the goal of maintaining naturalness in time-scale modification of voice. Specifically, we develop a multi-band algorithm to independently modify the slow trajectory and fast perturbation components of a contour for a more natural synthesis, and we present examples where pitch contours representative of speaking and singing voice are lengthened. In the speaking voice, the frequency content of flutter or irregularity is maintained, while slow pitch inflection is simply stretched or compressed. In the singing voice, rapid vibrato is preserved while slower note-to-note variation is scaled as desired.


conference of the international speech communication association | 2016

Relating Estimated Cyclic Spectral Peak Frequency to Measured Epilarynx Length Using Magnetic Resonance Imaging.

Elizabeth Godoy; Andrew Dumas; Jennifer T Melot; Nicolas Malyska; Thomas F. Quatieri

The epilarynx plays an important role in speech production, carrying information about the individual speaker and manner of articulation. However, precise acoustic behavior of this lower vocal tract structure is difficult to establish. Focusing on acoustics observable in natural speech, recent spectral processing techniques isolate a unique resonance with characteristics of the epilarynx previously shown via simulation, specifically cyclicity (i.e. energy differences between the closed and open phases of the glottal cycle) in a 3-5kHz region observed across vowels. Using Magnetic Resonance Imaging (MRI), the present work relates this estimated cyclic peak frequency to measured epilarynx length. Assuming a simple quarter wavelength relationship, the cavity length estimated from the cyclic peak frequency is shown to be directly proportional (linear fit slope =1.1) and highly correlated (ρ = 0.85, pval<10) to the measured epilarynx length across speakers. Results are discussed, as are implications in speech science and application domains.


conference of the international speech communication association | 2011

Automatic Detection of Depression in Speech Using Gaussian Mixture Modeling with Factor Analysis.

Douglas E. Sturim; Pedro A. Torres-Carrasquillo; Thomas F. Quatieri; Nicolas Malyska; Alan McCree

Collaboration


Dive into the Nicolas Malyska's collaboration.

Top Co-Authors

Avatar

Thomas F. Quatieri

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Douglas E. Sturim

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Robert B. Dunn

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Christopher J. Smalt

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Elizabeth Godoy

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Jennifer T Melot

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Alan McCree

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Andrea Carolina Trevino

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Brian S. Helfer

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Darrell O. Ricke

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge