Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hosung Nam is active.

Publication


Featured researches published by Hosung Nam.


Journal of the Acoustical Society of America | 2004

TADA: An enhanced, portable Task Dynamics model in MATLAB

Hosung Nam; Louis Goldstein; Elliot Saltzman; Dani Byrd

A portable computational system called TADA was developed for the Task Dynamic model of speech motor control [Saltzman and Munhall, Ecol. Psychol. 1, 333–382 (1989)]. The model maps from a set of linguistic gestures, specified as activation functions with corresponding constriction goal parameters, to time functions for a set of model articulators. The original Task Dynamic code was ported to the (relatively) platform‐independent MATLAB environment and includes a MATLAB version of the Haskins articulatory synthesizer, so that articulator motions computed by the Task Dynamic model can be used to generate sound. Gestural scores can now be edited graphically and the effects of gestural score changes on the models output evaluated. Other new features of the system include: (1) A graphical user interface that displays the input gestural scores, output time functions of constriction goal variables and articulators, and an animation of the resulting vocal‐tract motion; (2) Integration of the Task Dynamic model w...


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Articulatory Information for Noise Robust Speech Recognition

Vikramjit Mitra; Hosung Nam; Carol Y. Espy-Wilson; Elliot Saltzman; Louis Goldstein

Prior research has shown that articulatory information, if extracted properly from the speech signal, can improve the performance of automatic speech recognition systems. However, such information is not readily available in the signal. The challenge posed by the estimation of articulatory information from speech acoustics has led to a new line of research known as “acoustic-to-articulatory inversion” or “speech-inversion.” While most of the research in this area has focused on estimating articulatory information more accurately, few have explored ways to apply this information in speech recognition tasks. In this paper, we first estimated articulatory information in the form of vocal tract constriction variables (abbreviated as TVs) from the Aurora-2 speech corpus using a neural network based speech-inversion model. Word recognition tasks were then performed for both noisy and clean speech using articulatory information in conjunction with traditional acoustic features. Our results indicate that incorporating TVs can significantly improve word recognition rates when used in conjunction with traditional acoustic features.


IEEE Journal of Selected Topics in Signal Processing | 2010

Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies

Vikramjit Mitra; Hosung Nam; Carol Y. Espy-Wilson; Elliot Saltzman; Louis Goldstein

Many different studies have claimed that articulatory information can be used to improve the performance of automatic speech recognition systems. Unfortunately, such articulatory information is not readily available in typical speaker-listener situations. Consequently, such information has to be estimated from the acoustic signal in a process which is usually termed “speech-inversion.” This study aims to propose and compare various machine learning strategies for speech inversion: Trajectory mixture density networks (TMDNs), feedforward artificial neural networks (FF-ANN), support vector regression (SVR), autoregressive artificial neural network (AR-ANN), and distal supervised learning (DSL). Further, using a database generated by the Haskins Laboratories speech production model, we test the claim that information regarding constrictions produced by the distinct organs of the vocal tract (vocal tract variables) is superior to flesh-point information (articulatory pellet trajectories) for the inversion process.


Language Learning and Development | 2011

An Articulatory Phonology Account of Preferred Consonant-Vowel Combinations

Sara Giulivi; D. H. Whalen; Louis Goldstein; Hosung Nam; Andrea G. Levitt

Certain consonant/vowel combinations (labial/central, coronal/front, velar/back) are more frequent in babbling as well as, to a lesser extent, in adult language than chance would dictate. The “Frame then Content” (F/C) hypothesis (Davis & MacNeilage, 1994) attributes this pattern to biomechanical vocal-tract biases that change as infants mature. Articulatory Phonology (AP; Browman & Goldstein, 1989) attributes preferences to demands placed on shared articulators. F/C implies that preferences will diminish as articulatory control increases, while AP does not. Here, babbling from children at 6, 9, and 12 months in English, French, and Mandarin environments was examined. There was no developmental trend in CV preferences, although older ages exhibited greater articulatory control. A perception test showed no evidence of bias toward hearing the preferred combinations. Modeling using articulatory synthesis found limited support for F/C but more for AP, including data not originally encompassed in F/C. AP thus provides an alternative biomechanical explanation.


Archive | 2006

The Distinctions Between State, Parameter and Graph Dynamics in Sensorimotor Control and Coordination

Elliot Saltzman; Hosung Nam; Louis Goldstein; Dani Byrd

The dynamical systems underlying the performance and learning of skilled behaviors can be analyzed in terms of state-, parameter-, and graph-dynamics. We review these concepts and then focus on the manner in which variation in dynamical graph structure can be used to explicate the temporal patterning of speech. Simulations are presented of speech gestural sequences using the task-dynamic model of speech production, and the importance of system graphs in shaping intergestural relative phasing patterns (both their mean values and their variability) within and between syllables is highlighted.


international conference on acoustics, speech, and signal processing | 2009

From acoustics to Vocal Tract time functions

Vikramjit Mitra; I. Yücel Özbek; Hosung Nam; Xinhui Zhou; Carol Y. Espy-Wilson

In this paper we present a technique for obtaining Vocal Tract (VT) time functions from the acoustic speech signal. Knowledge-based Acoustic Parameters (APs) are extracted from the speech signal and a pertinent subset is used to obtain the mapping between them and the VT time functions. Eight different vocal tract constriction variables consisting of five constriction degree variables, lip aperture (LA), tongue body (TBCD), tongue tip (TTCD), velum (VEL), and glottis (GLO); and three constriction location variables, lip protrusion (LP), tongue tip (TTCL), tongue body (TBCL) were considered in this study. The TAsk Dynamics Application model (TADA [1]) is used to create a synthetic speech dataset along with its corresponding VT time functions. We explore Support Vector Regression (SVR) followed by Kalman smoothing to achieve mapping between the APs and the VT time functions.


Journal of Phonetics | 2015

Vowel variability in elicited versus spontaneous speech: Evidence from Mixtec

Christian DiCanio; Hosung Nam; Jonathan D. Amith; Rey Castillo García; D. H. Whalen

Abstract This study investigates the influence of speech style, duration, contextual factors, and sex on vowel dispersion and variability in Yoloxochitl Mixtec, an endangered language spoken in Mexico. Oral vowels were examined from recordings of elicited citation words and spontaneous narrative speech matched across seven speakers. Results show spontaneous speech to contain shorter vowel durations and stronger effects of contextual assimilation than elicited speech. The vowel space is less disperse and there is greater intra-vowel variability in spontaneous speech than in elicited speech. Furthermore, male speakers show smaller differences in vowel dispersion and duration across styles than female speakers do. These phonetic differences across speech styles are not entirely reducible to durational differences; rather, speakers also seem to adjust their articulatory/acoustic precision in accordance with style. Despite the stylistic differences, we find robust acoustic differences between vowels in spontaneous speech, maintaining the overall vowel space pattern. While style and durational changes produce noticeable differences in vowel acoustics, one can closely approximate the phonetics of a vowel system of an endangered language from narrative speech. Elicited speech is likelier to give the most extreme formants used by the language than is spontaneous speech, but the usefulness of phonetic data from spontaneous speech has still been demonstrated.


Journal of Phonetics | 2012

Bridging planning and execution: Temporal planning of syllables.

Christine Mooshammer; Louis Goldstein; Hosung Nam; Scott McClure; Elliot Saltzman; Mark Tiede

This study compares the time to initiate words with varying syllable structures (V, VC, CV, CVC, CCV, CCVC). In order to test the hypothesis that different syllable structures require different amounts of time to prepare their temporal controls, or plans, two delayed naming experiments were carried out. In the first of these the initiation time was determined from acoustic recordings. The results confirmed the hypothesis but also showed an interaction with the initial segment (i.e., vowel-initial words were initiated later than words beginning with consonants, but this difference was much smaller for words starting stops compared to /l/ or /s/). Adding a coda did not affect the initiation time. In order to rule out effects of segment-specific articulatory to acoustic interval differences, a second experiment was performed in which speech movements of the tongue, the jaw and the lips were recorded by means of electromagnetic articulography. Results from initiation time, based on articulatory measurements, showed a significant syllable structure effect with VC words being initiated significantly later than CV(C) words. Only minor effects of the initial segment were found. These results can be partly explained by the amount of accumulated experience a speaker has in coordinating the relevant gesture combinations and triggering them appropriately in time.


international conference on acoustics, speech, and signal processing | 2011

Gesture-based Dynamic Bayesian Network for noise robust speech recognition

Vikramjit Mitra; Hosung Nam; Carol Y. Espy-Wilson; Elliot Saltzman; Louis Goldstein

Previously we have proposed different models for estimating articulatory gestures and vocal tract variable (TV) trajectories from synthetic speech. We have shown that when deployed on natural speech, such models can help to improve the noise robustness of a hidden Markov model (HMM) based speech recognition system. In this paper we propose a model for estimating TVs trained on natural speech and present a Dynamic Bayesian Network (DBN) based speech recognition architecture that treats vocal tract constriction gestures as hidden variables, eliminating the necessity for explicit gesture recognition. Using the proposed architecture we performed a word recognition task for the noisy data of Aurora-2. Significant improvement was observed in using the gestural information as hidden variables in a DBN architecture over using only the mel-frequency cepstral coefficient based HMM or DBN backend. We also compare our results with other noise-robust front ends.


Journal of the Acoustical Society of America | 2012

A procedure for estimating gestural scores from speech acoustics

Hosung Nam; Vikramjit Mitra; Mark Tiede; Mark Hasegawa-Johnson; Carol Y. Espy-Wilson; Elliot Saltzman; Louis Goldstein

Speech can be represented as a constellation of constricting vocal tract actions called gestures, whose temporal patterning with respect to one another is expressed in a gestural score. Current speech datasets do not come with gestural annotation and no formal gestural annotation procedure exists at present. This paper describes an iterative analysis-by-synthesis landmark-based time-warping architecture to perform gestural annotation of natural speech. For a given utterance, the Haskins Laboratories Task Dynamics and Application (TADA) model is employed to generate a corresponding prototype gestural score. The gestural score is temporally optimized through an iterative timing-warping process such that the acoustic distance between the original and TADA-synthesized speech is minimized. This paper demonstrates that the proposed iterative approach is superior to conventional acoustically-referenced dynamic timing-warping procedures and provides reliable gestural annotation for speech datasets.

Collaboration


Dive into the Hosung Nam's collaboration.

Top Co-Authors

Avatar

Louis Goldstein

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

D. H. Whalen

City University of New York

View shared research outputs
Top Co-Authors

Avatar

Mark Tiede

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge