Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jangwon Kim is active.

Publication


Featured researches published by Jangwon Kim.


Journal of the Acoustical Society of America | 2014

Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC)

Shrikanth Narayanan; Asterios Toutios; Vikram Ramanarayanan; Adam C. Lammert; Jangwon Kim; Sungbok Lee; Krishna S. Nayak; Yoon Chul Kim; Yinghua Zhu; Louis Goldstein; Dani Byrd; Erik Bresch; Athanasios Katsamanis; Michael Proctor

USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English. Electromagnetic articulography data have also been presently collected from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus used previously in the MOCHA-TIMIT database. In both cases the audio signal was recorded and synchronized with the articulatory data. The database and companion software are freely available to the research community.


Journal of the Acoustical Society of America | 2014

Co-registration of speech production datasets from electromagnetic articulography and real-time magnetic resonance imaging

Jangwon Kim; Adam C. Lammert; Shrikanth Narayanan

This paper describes a spatio-temporal registration approach for speech articulation data obtained from electromagnetic articulography (EMA) and real-time Magnetic Resonance Imaging (rtMRI). This is motivated by the potential for combining the complementary advantages of both types of data. The registration method is validated on EMA and rtMRI datasets obtained at different times, but using the same stimuli. The aligned corpus offers the advantages of high temporal resolution (from EMA) and a complete mid-sagittal view (from rtMRI). The co-registration also yields optimum placement of EMA sensors as articulatory landmarks on the magnetic resonance images, thus providing richer spatio-temporal information about articulatory dynamics.


international conference on acoustics, speech, and signal processing | 2010

An exploratory study of manifolds of emotional speech

Jangwon Kim; Sungbok Lee; Shrikanth Narayanan

This study explores manifold representations of emotionally modulated speech. The manifolds are derived in the articulatory space and two acoustic spaces (MFB and MFCC) using isometric feature mapping (Isomap) with data from an emotional speech corpus. Their effectiveness in representing emotional speech is tested based on the emotion classification accuracy. Results show that the effective manifold dimensions of the articulatory and MFB spaces are both about 5 while being greater in MFCC space. Also, the accuracies in the articulatory and MFB manifolds are close to those in the original spaces, but this is not the case for the MFCC. It is speculated that the manifold in the MFCC space is less structured, or more distorted, than others.


Computer Speech & Language | 2016

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals

Ming Li; Jangwon Kim; Adam C. Lammert; Vikram Ramanarayanan; Shrikanth Narayanan

We propose a practical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks.


international conference on acoustics, speech, and signal processing | 2016

Pathological speech processing: State-of-the-art, current challenges, and future directions

Rahul Gupta; Theodora Chaspari; Jangwon Kim; Naveen Kumar; Daniel Bone; Shrikanth Narayanan

The study of speech pathology involves evaluation and treatment of speech production related disorders affecting phonation, fluency, intonation and aeromechanical components of respiration. Recently, speech pathology has garnered special interest amongst machine learning and signal processing (ML-SP) scientists. This growth in interest is led by advances in novel data collection technology, data science, speech processing and computational modeling. These in turn have enabled scientists in better understanding both the causes and effects of pathological speech conditions. In this paper, we review the application of machine learning and signal processing techniques to speech pathology and specifically focus on three different aspects. First, we list challenges such as controlling subjectivity in pathological speech assessments and patient variability in the application of ML-SP tools to the domain. Second, we discuss feature design methods and machine learning algorithms using a combination of domain knowledge and data driven methods. Finally, we present some case studies related to analysis of pathological speech and discuss their design.


international conference on acoustics, speech, and signal processing | 2013

Spatial and temporal alignment of multimodal human speech production data: Real time imaging, flesh point tracking and audio

Jangwon Kim; Adam C. Lammert; Shrikanth Narayanan

In speech production research, the integration of articulatory data derived from multiple measurement modalities can provide rich description of vocal tract dynamics by overcoming the limited spatio-temporal representations offered by individual modalities. This paper presents a spatial and temporal alignment method between two promising modalities using a corpus of TIMIT sentences obtained from the same speaker: flesh point tracking from Electromagnetic Articulography (EMA) that offers high temporal resolution but sparse spatial information and real time Magnetic Resonance Imaging (MRI) that offers good spatial details but at lower temporal rates. Spatial alignment is done by using palate tracking of EMA, but distortion in MRI audio and articulatory data variability make temporal alignment challenging. This paper proposes a novel alignment technique using joint acoustic-articulatory features which combines dynamic time warping and automatic feature extraction from MRI images. Experimental results show that the temporal alignment obtained using this technique is better (12% relative) than that using acoustic feature only.


Journal of the Acoustical Society of America | 2015

A kinematic study of critical and non-critical articulators in emotional speech production.

Jangwon Kim; Asterios Toutios; Sungbok Lee; Shrikanth Narayanan

This study explores one aspect of the articulatory mechanism that underlies emotional speech production, namely, the behavior of linguistically critical and non-critical articulators in the encoding of emotional information. The hypothesis is that the possible larger kinematic variability in the behavior of non-critical articulators enables revealing underlying emotional expression goal more explicitly than that of the critical articulators; the critical articulators are strictly controlled in service of achieving linguistic goals and exhibit smaller kinematic variability. This hypothesis is examined by kinematic analysis of the movements of critical and non-critical speech articulators gathered using eletromagnetic articulography during spoken expressions of five categorical emotions. Analysis results at the level of consonant-vowel-consonant segments reveal that critical articulators for the consonants show more (less) peripheral articulations during production of the consonant-vowel-consonant syllables for high (low) arousal emotions, while non-critical articulators show less sensitive emotional variation of articulatory position to the linguistic gestures. Analysis results at the individual phonetic targets show that overall, between- and within-emotion variability in articulatory positions is larger for non-critical cases than for critical cases. Finally, the results of simulation experiments suggest that the postural variation of non-critical articulators depending on emotion is significantly associated with the controls of critical articulators.


conference of the international speech communication association | 2016

Illustrating the Production of the International Phonetic Alphabet Sounds Using Fast Real-Time Magnetic Resonance Imaging.

Asterios Toutios; Sajan Goud Lingala; Colin Vaz; Jangwon Kim; John H. Esling; Patricia A. Keating; Matthew Gordon; Dani Byrd; Louis Goldstein; Krishna S. Nayak; Shrikanth Narayanan

Recent advances in real-time magnetic resonance imaging (rtMRI) of the upper airway for acquiring speech production data provide unparalleled views of the dynamics of a speaker’s vocal tract at very high frame rates (83 frames per second and even higher). This paper introduces an effort to collect and make available on-line rtMRI data corresponding to a large subset of the sounds of the world’s languages as encoded in the International Phonetic Alphabet, with supplementary English words and phonetically-balanced texts, produced by four prominent phoneticians, using the latest rtMRI technology. The technique images oral as well as laryngeal articulator movements in the production of each sound category. This resource is envisioned as a teaching tool in pronunciation training, second language acquisition, and speech therapy.


Journal of the Acoustical Society of America | 2014

A comparison study of emotional speech articulations using the principal component analysis method

Sungbok Lee; Jangwon Kim; Shrikanth Narayanan

In this study, we will investigate differences in the tongue movements across emotions by utilizing the principal component analysis (PCA), which enables to detect major and minor variations in the entire tongue surface movements under different speaking conditions. For the purpose of the study, we analyze an acted emotional speech production corpus collected from one actor and two actresses using an electromagnetic articulography (EMA). Discrete emotion types considered in this study are anger, sadness, and happiness as well as neutral one as reference. Specifically, we will investigate the number of principal components that are needed to capture emotional variations, and the differences in tongue shaping across different emotion types. Outcome of the study would provide supplementary information to the previous PCA-based production studies which have mainly focused on normal, or neutral, speech articulation. Major principal components that are found in the study also can be utilized as a basis by which effective but compact articulatory correlates (i.e., component scores) can be derived for a further investigation of emotional speech production such as an interaction between articulatory kinematics and prosodic modulations of pitch and loudness patterns, which is another important motivation of the study.


Journal of the Acoustical Society of America | 2012

Co-registration of articulographic and real-time magnetic resonance imaging data for multimodal analysis of rapid speech

Jangwon Kim; Adam C. Lammert; Michael Proctor; Shrikanth Narayanan

We propose a method for co-registrating speech articulatory/acoustic data from two modalities that provide complementary advantages. Electromagnetic Articulography (EMA) provides high temporal resolution (100 samples/second in WAVE system) and flesh-point tracking, while real-time Magnetic Resonance Imaging, rtMRI, (23 frames/second) offers a complete midsagittal view of the vocal tract, including articulated structures and the articulatory environment. Co-registration was achieved through iterative alignment in the acoustic and articulatory domains. Acoustic signals were aligned temporally using Dynamic Time Warping, while articulatory signals were aligned variously by minimization of mean total error between articulatometry data and estimated corresponding flesh points and by using mutual information derived from articulatory parameters for each sentence. We demonstrate our method on a subset of the TIMIT corpus elicited from a male and a female speaker of American English, and illustrate the benefits o...

Collaboration


Dive into the Jangwon Kim's collaboration.

Top Co-Authors

Avatar

Shrikanth Narayanan

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Sungbok Lee

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Adam C. Lammert

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Asterios Toutios

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Naveen Kumar

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Krishna S. Nayak

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Maarten Van Segbroeck

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Ming Li

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Rahul Gupta

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Vikram Ramanarayanan

University of Southern California

View shared research outputs
Researchain Logo
Decentralizing Knowledge