Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Daryush D. Mehta is active.

Publication


Featured researches published by Daryush D. Mehta.


IEEE Transactions on Biomedical Engineering | 2012

Mobile Voice Health Monitoring Using a Wearable Accelerometer Sensor and a Smartphone Platform

Daryush D. Mehta; Matías Zañartu; Shengran W. Feng; Harold A. Cheyne; Robert E. Hillman

Many common voice disorders are chronic or recurring conditions that are likely to result from faulty and/or abusive patterns of vocal behavior, referred to generically as vocal hyperfunction. An ongoing goal in clinical voice assessment is the development and use of noninvasively derived measures to quantify and track the daily status of vocal hyperfunction so that the diagnosis and treatment of such behaviorally based voice disorders can be improved. This paper reports on the development of a new, versatile, and cost-effective clinical tool for mobile voice monitoring that acquires the high-bandwidth signal from an accelerometer sensor placed on the neck skin above the collarbone. Using a smartphone as the data acquisition platform, the prototype device provides a user-friendly interface for voice use monitoring, daily sensor calibration, and periodic alert capabilities. Pilot data are reported from three vocally normal speakers and three subjects with voice disorders to demonstrate the potential of the device to yield standard measures of fundamental frequency and sound pressure level and model-based glottal airflow properties. The smartphone-based platform enables future clinical studies for the identification of the best set of measures for differentiating between normal and hyperfunctional patterns of voice use.


acm multimedia | 2013

Vocal biomarkers of depression based on motor incoordination

James R. Williamson; Thomas F. Quatieri; Brian S. Helfer; Rachelle Horwitz; Bea Yu; Daryush D. Mehta

In Major Depressive Disorder (MDD), neurophysiologic changes can alter motor control [1, 2] and therefore alter speech production by influencing the characteristics of the vocal source, tract, and prosodics. Clinically, many of these characteristics are associated with psychomotor retardation, where a patient shows sluggishness and motor disorder in vocal articulation, affecting coordination across multiple aspects of production [3, 4]. In this paper, we exploit such effects by selecting features that reflect changes in coordination of vocal tract motion associated with MDD. Specifically, we investigate changes in correlation that occur at different time scales across formant frequencies and also across channels of the delta-mel-cepstrum. Both feature domains provide measures of coordination in vocal tract articulation while reducing effects of a slowly-varying linear channel, which can be introduced by time-varying microphone placements. With these two complementary feature sets, using the AVEC 2013 depression dataset, we design a novel Gaussian mixture model (GMM)-based multivariate regression scheme, referred to as Gaussian Staircase Regression, that provides a root-mean-squared-error (RMSE) of 7.42 and a mean-absolute-error (MAE) of 5.75 on the standard Beck depression rating scale. We are currently exploring coordination measures of other aspects of speech production, derived from both audio and video signals.


Current Opinion in Otolaryngology & Head and Neck Surgery | 2008

Voice assessment: Updates on perceptual, acoustic, aerodynamic, and endoscopic imaging methods

Daryush D. Mehta; Robert E. Hillman

Purpose of reviewThis paper describes recent advances in perceptual, acoustic, aerodynamic, and endoscopic imaging methods for assessing voice function. Recent findingsWe review advances from four major areas.Perceptual assessment: Speech-language pathologists are being encouraged to use the new consensus auditory-perceptual evaluation of voice inventory for auditory-perceptual assessment of voice quality, and recent studies have provided new insights into listener reliability issues that have plagued subjective perceptual judgments of voice quality.Acoustic assessment: Progress is being made on the development of algorithms that are more robust for analyzing disordered voices, including the capability to extract voice quality-related measures from running speech segments.Aerodynamic assessment: New devices for measuring phonation threshold air pressures and air flows have the potential to serve as sensitive indices of glottal phonatory conditions, and recent developments in aeroacoustic theory may provide new insights into laryngeal sound production mechanisms.Endoscopic imaging: The increased light sensitivity of new ultra high-speed color digital video processors is enabling high-quality endoscopic imaging of vocal fold tissue motion at unprecedented image capture rates, which promises to provide new insights into the mechanisms of normal and disordered voice production. SummarySome of the recent research advances in voice function assessment could be more readily adopted into clinical practice, whereas others will require further development.


Annals of Otology, Rhinology, and Laryngology | 2010

Voice Production Mechanisms following Phonosurgical Treatment of Early Glottic Cancer

Daryush D. Mehta; Dimitar D. Deliyski; Steven M. Zeitels; Thomas F. Quatieri; Robert E. Hillman

Objectives: Although near-normal conversational voices can be achieved with the phonosurgical management of early glottic cancer, there are still acoustic and aerodynamic deficits in vocal function that must be better understood to help further optimize phonosurgical interventions. Stroboscopic assessment is inadequate for this purpose. Methods: A newly developed color high-speed videoendoscopy (HSV) system that included time-synchronized recordings of the acoustic signal was used to perform a detailed examination of voice production mechanisms in 14 subjects. Digital image processing techniques were used to quantify glottal phonatory function and to delineate relationships between vocal fold vibratory properties and acoustic perturbation measures. Results: The results for multiple measurements of vibratory asymmetry showed that 31% to 62% of subjects displayed higher-than-normal average values, whereas the mean values for glottal closure duration (open quotient) and periodicity of vibration fell within normal limits. The average HSV-based measures did not correlate significantly with the acoustic perturbation measures, but moderate correlations were exhibited between the acoustic measures and the SDs of the HSV-based parameters. Conclusions: The use of simultaneous, time-synchronized HSV and acoustic recordings can provide new insights into postoperative voice production mechanisms that cannot be obtained with stroboscopic assessment.


Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge | 2014

Vocal and Facial Biomarkers of Depression based on Motor Incoordination and Timing

James R. Williamson; Thomas F. Quatieri; Brian S. Helfer; Gregory Ciccarelli; Daryush D. Mehta

In individuals with major depressive disorder, neurophysiological changes often alter motor control and thus affect the mechanisms controlling speech production and facial expression. These changes are typically associated with psychomotor retardation, a condition marked by slowed neuromotor output that is behaviorally manifested as altered coordination and timing across multiple motor-based properties. Changes in motor outputs can be inferred from vocal acoustics and facial movements as individuals speak. We derive novel multi-scale correlation structure and timing feature sets from audio-based vocal features and video-based facial action units from recordings provided by the 4th International Audio/Video Emotion Challenge (AVEC). The feature sets enable detection of changes in coordination, movement, and timing of vocal and facial gestures that are potentially symptomatic of depression. Combining complementary features in Gaussian mixture model and extreme learning machine classifiers, our multivariate regression scheme predicts Beck depression inventory ratings on the AVEC test set with a root-mean-square error of 8.12 and mean absolute error of 6.31. Future work calls for continued study into detection of neurological disorders based on altered coordination and timing across audio and video modalities.


Current Opinion in Otolaryngology & Head and Neck Surgery | 2012

Current role of stroboscopy in laryngeal imaging.

Daryush D. Mehta; Robert E. Hillman

Purpose of reviewTo summarize recent technological advancements and insight into the role of stroboscopy in laryngeal imaging. Recent findingsAlthough stroboscopic technology has not undergone major technological improvements, recent clarifications have been made to the application of stroboscopic principles to video-based laryngeal imaging. Also recent advances in coupling stroboscopy with high-definition video cameras provide higher spatial resolution of vocal fold vibratory function during phonation. Studies indicate that the interrater reliability of visual stroboscopic assessment varies depending on the laryngeal feature being rated and that only a subset of features may be needed to be representative of an entire assessment. High-speed videoendoscopy (HSV) judgments have been shown to be more sensitive than stroboscopy for evaluating vocal fold phase asymmetry, pointing to the future potential of complementing stroboscopy with alternative imaging modalities in hybrid systems. Laryngeal videostroboscopy alone continues to play a central role in clinical voice assessment. Even though HSV may provide more detailed information about phonatory function, its eventual clinical adoption will depend on how remaining practical, technical, and methodological challenges will be met. SummaryLaryngeal videostroboscopy continues to be the modality of choice for imaging vocal fold vibration, but technological advancements in HSV and associated research findings are driving increased interest in the clinical adoption of HSV to complement videostroboscopic assessment.


Journal of the Acoustical Society of America | 2011

Observation and analysis of in vivo vocal fold tissue instabilities produced by nonlinear source-filter coupling: a case study.

Matías Zañartu; Daryush D. Mehta; Julio C. Ho; George R. Wodicka; Robert E. Hillman

Different source-related factors can lead to vocal fold instabilities and bifurcations referred to as voice breaks. Nonlinear coupling in phonation suggests that changes in acoustic loading can also be responsible for this unstable behavior. However, no in vivo visualization of tissue motion during these acoustically induced instabilities has been reported. Simultaneous recordings of laryngeal high-speed videoendoscopy, acoustics, aerodynamics, electroglottography, and neck skin acceleration are obtained from a participant consistently exhibiting voice breaks during pitch glide maneuvers. Results suggest that acoustically induced and source-induced instabilities can be distinguished at the tissue level. Differences in vibratory patterns are described through kymography and phonovibrography; measures of glottal area, open/speed quotient, and amplitude/phase asymmetry; and empirical orthogonal function decomposition. Acoustically induced tissue instabilities appear abruptly and exhibit irregular vocal fold motion after the bifurcation point, whereas source-induced ones show a smoother transition. These observations are also reflected in the acoustic and acceleration signals. Added aperiodicity is observed after the acoustically induced break, and harmonic changes appear prior to the bifurcation for the source-induced break. Both types of breaks appear to be subcritical bifurcations due to the presence of hysteresis and amplitude changes after the frequency jumps. These results are consistent with previous studies and the nonlinear source-filter coupling theory.


Journal of the Acoustical Society of America | 2012

Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking.

Daryush D. Mehta; Daniel Rudoy; Patrick J. Wolfe

Vocal tract resonance characteristics in acoustic speech signals are classically tracked using frame-by-frame point estimates of formant frequencies followed by candidate selection and smoothing using dynamic programming methods that minimize ad hoc cost functions. The goal of the current work is to provide both point estimates and associated uncertainties of center frequencies and bandwidths in a statistically principled state-space framework. Extended Kalman (K) algorithms take advantage of a linearized mapping to infer formant and antiformant parameters from frame-based estimates of autoregressive moving average (ARMA) cepstral coefficients. Error analysis of KARMA, wavesurfer, and praat is accomplished in the all-pole case using a manually marked formant database and synthesized speech waveforms. KARMA formant tracks exhibit lower overall root-mean-square error relative to the two benchmark algorithms with the ability to modify parameters in a controlled manner to trade off bias and variance. Antiformant tracking performance of KARMA is illustrated using synthesized and spoken nasal phonemes. The simultaneous tracking of uncertainty levels enables practitioners to recognize time-varying confidence in parameters of interest and adjust algorithmic settings accordingly.


IEEE Transactions on Biomedical Engineering | 2014

Learning to Detect Vocal Hyperfunction From Ambulatory Neck-Surface Acceleration Features: Initial Results for Vocal Fold Nodules

Marzyeh Ghassemi; Jarrad H. Van Stan; Daryush D. Mehta; Matías Zañartu; Harold A. Cheyne; Robert E. Hillman; John V. Guttag

Voice disorders are medical conditions that often result from vocal abuse/misuse which is referred to generically as vocal hyperfunction. Standard voice assessment approaches cannot accurately determine the actual nature, prevalence, and pathological impact of hyperfunctional vocal behaviors because such behaviors can vary greatly across the course of an individuals typical day and may not be clearly demonstrated during a brief clinical encounter. Thus, it would be clinically valuable to develop noninvasive ambulatory measures that can reliably differentiate vocal hyperfunction from normal patterns of vocal behavior. As an initial step toward this goal we used an accelerometer taped to the neck surface to provide a continuous, noninvasive acceleration signal designed to capture some aspects of vocal behavior related to vocal cord nodules, a common manifestation of vocal hyperfunction. We gathered data from 12 female adult patients diagnosed with vocal fold nodules and 12 control speakers matched for age and occupation. We derived features from weeklong neck-surface acceleration recordings by using distributions of sound pressure level and fundamental frequency over 5-min windows of the acceleration signal and normalized these features so that intersubject comparisons were meaningful. We then used supervised machine learning to show that the two groups exhibit distinct vocal behaviors that can be detected using the acceleration signal. We were able to correctly classify 22 of the 24 subjects, suggesting that in the future measures of the acceleration signal could be used to detect patients with the types of aberrant vocal behaviors that are associated with hyperfunctional voice disorders.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Subglottal Impedance-Based Inverse Filtering of Voiced Sounds Using Neck Surface Acceleration

Matías Zañartu; Julio C. Ho; Daryush D. Mehta; Robert E. Hillman; George R. Wodicka

A model-based inverse filtering scheme is proposed for an accurate, non-invasive estimation of the aerodynamic source of voiced sounds at the glottis. The approach, referred to as subglottal impedance-based inverse filtering (IBIF), takes as input the signal from a lightweight accelerometer placed on the skin over the extrathoracic trachea and yields estimates of glottal airflow and its time derivative, offering important advantages over traditional methods that deal with the supraglottal vocal tract. The proposed scheme is based on mechano-acoustic impedance representations from a physiologically-based transmission line model and a lumped skin surface representation. A subject-specific calibration protocol is used to account for individual adjustments of subglottal impedance parameters and mechanical properties of the skin. Preliminary results for sustained vowels with various voice qualities show that the subglottal IBIF scheme yields comparable estimates with respect to current aerodynamics-based methods of clinical vocal assessment. A mean absolute error of less than 10% was observed for two glottal airflow measures-maximum flow declination rate and amplitude of the modulation component-that have been associated with the pathophysiology of some common voice disorders caused by faulty and/or abusive patterns of vocal behavior (i.e., vocal hyper-function). The proposed method further advances the ambulatory assessment of vocal function based on the neck acceleration signal, that previously have been limited to the estimation of phonation duration, loudness, and pitch. Subglottal IBIF is also suitable for other ambulatory applications in speech communication, in which further evaluation is underway.

Collaboration


Dive into the Daryush D. Mehta's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Thomas F. Quatieri

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Dimitar D. Deliyski

Cincinnati Children's Hospital Medical Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Brian S. Helfer

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John V. Guttag

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge