Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Seong-Jun Hahm is active.

Publication


Featured researches published by Seong-Jun Hahm.


Computer Speech & Language | 2013

Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds

Marc Delcroix; Keisuke Kinoshita; Tomohiro Nakatani; Shoko Araki; Atsunori Ogawa; Takaaki Hori; Shinji Watanabe; Masakiyo Fujimoto; Takuya Yoshioka; Takanobu Oba; Yotaro Kubo; Mehrez Souden; Seong-Jun Hahm; Atsushi Nakamura

Research on noise robust speech recognition has mainly focused on dealing with relatively stationary noise that may differ from the noise conditions in most living environments. In this paper, we introduce a recognition system that can recognize speech in the presence of multiple rapidly time-varying noise sources as found in a typical family living room. To deal with such severe noise conditions, our recognition system exploits all available information about speech and noise; that is spatial (directional), spectral and temporal information. This is realized with a model-based speech enhancement pre-processor, which consists of two complementary elements, a multi-channel speech-noise separation method that exploits spatial and spectral information, followed by a single channel enhancement algorithm that uses the long-term temporal characteristics of speech obtained from clean speech examples. Moreover, to compensate for any mismatch that may remain between the enhanced speech and the acoustic model, our system employs an adaptation technique that combines conventional maximum likelihood linear regression with the dynamic adaptive compensation of the variance of the Gaussians of the acoustic model. Our proposed system approaches human performance levels by greatly improving the audible quality of speech and substantially improving the keyword recognition accuracy.


conference of the international speech communication association | 2015

Recognizing Dysarthric Speech due to Amyotrophic Lateral Sclerosis with Across-Speaker Articulatory Normalization

Seong-Jun Hahm; Daragh Heitzman; Jun Wang

Recent dysarthric speech recognition studies using mixed data from a collection of neurological diseases suggested articulatory data can help to improve the speech recognition performance. This project was specifically designed for the speakerindependent recognition of dysarthric speech due to amyotrophic lateral sclerosis (ALS) using articulatory data. In this paper, we investigated three across-speaker normalization approaches in acoustic, articulatory, and both spaces: Procrustes matching (a physiological approach in articulatory space), vocal tract length normalization (a data-driven approach in acoustic space), and feature space maximum likelihood linear regression (a model-based approach for both spaces), to address the issue of high degree of variation of articulation across different speakers. A preliminary ALS data set was collected and used to evaluate the approaches. Two recognizers, Gaussian mixture model (GMM) - hidden Markov model (HMM) and deep neural network (DNN) - HMM, were used. Experimental results showed adding articulatory data significantly reduced the phoneme error rates (PERs) using any or combined normalization approaches. DNN-HMM outperformed GMM-HMM in all configurations. The best performance (30.7% PER) was obtained by triphone DNN-HMM + acoustic and articulatory data + all three normalization approaches, a 15.3% absolute PER reduction from the baseline using triphone GMM-HMM + acoustic data. Index Terms: Dysarthric speech recognition, Procrustes matching, vocal track length normalization, fMLLR, hidden Markov models, deep neural network


international conference on acoustics, speech, and signal processing | 2013

Feature space variational Bayesian linear regression and its combination with model space VBLR

Seong-Jun Hahm; Atsunori Ogawa; Marc Delcroix; Masakiyo Fujimoto; Takaaki Hori; Atsushi Nakamura

In this paper, we propose a tuning-free Bayesian linear regression approach for speaker adaptation. We first formulate feature space variational Bayesian linear regression (fVBLR). Using a lower bound as the objective function, we can optimize a binary tree structure and control parameters for prior density scaling. We experimentally verified the proposed fVBLR could achieve performance comparable to that of the conventional fine-tuned fSMAPLR and SMAPLR. For further performance improvement regardless of the amount of adaptation data, we combine fVBLR with model space VBLR (fVBLR+VBLR). Therefore, feature space normalization and model space adaptation are consistently performed based on a variational Bayesian approach without any tuning parameters. In the experiment, the proposed fVBLR+VBLR showed performance improvement compared with both fVBLR and VBLR.


international conference on acoustics, speech, and signal processing | 2010

Aspect-model-based reference speaker weighting

Seong-Jun Hahm; Yuichi Ohkawa; Masashi Ito; Motoyuki Suzuki; Akinori Ito; Shozo Makino

We propose an aspect-model-based reference speaker weighting. The main idea of the approach is that the adapted model is a linear combination of a set of reference speakers like reference speaker weighting (RSW) and eigenvoices. The aspect model is the mixture model of speaker-dependent (SD) models. In this paper, aspect model weighting (AMW) is proposed for finding an optimal weighting of a set of reference speakers unlike RSW and the aspect model which is a kind of cluster models is trained based on likelihood maximization with respect to the training data. The number of adaptation parameters can also be reduced using aspect model approach. For evaluation, we carried out an isolated word recognition experiment on Korean database (KLE452). The results were compared to those of conventional MAP, MLLR, RSW, and eigenvoice. Even though we use only 0.5s of adaptation data, 27.24% relative error rate reduction in comparison with speaker-independent (SI) baseline performance was achieved.


intelligent information hiding and multimedia signal processing | 2011

Manipulating Vocal Signal in Mixed Music Sounds Using Small Amount of Side Information

Yuto Sasaki; Seong-Jun Hahm; Akinori Ito

In this paper, we propose a method for manipulating vocal sound in mixed music signals using side information. In the proposed method, fundamental frequency (F0) of the vocal signal is used as side information. F0 information is estimated from the target signal before being mixed with backing track signals. After receiving the mixed music signal, vocal sound manipulation is performed using a comb filter using F0 information. The performance was evaluated using signal-to-noise ratio (SNR) as well as PEAQ. Then, we evaluated influence of quantization bit rate on average error of F0 information.


2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications Workshops | 2011

Utterance Classification for Combination of Multiple Simple Dialog Systems

Seong-Jun Hahm; Akinori Ito; Kentaro Awano; Masashi Ito; Shozo Makino

This paper describes an utterance classification method for combining multiple dialog systems. For reducing effort of developing spoken dialog systems, several dialog systems have been proposed that do not require complicated dialog description. However, these systems are so simple that only very limited type of dialogs are accepted by these systems. We propose a spoken dialog development by combining these simple dialog systems for developing a dialog system that accepts more flexible dialogs. Combination of dialog systems is based on utterance classification. We conducted an utterance classification experiment, and 77.1% of the utterances including out-of-task utterances were correctly classified.


international conference on acoustics, speech, and signal processing | 2013

Unsupervised discriminative adaptation using differenced maximum mutual information based linear regression

Marc Delcroix; Atsunori Ogawa; Seong-Jun Hahm; Tomohiro Nakatani; Atsushi Nakamura

This paper proposes a new approach for unsupervised model adaptation using a discriminative criterion. Discriminative criteria for acoustic model training have been widely used and have provided significantly improved performance compared with models trained using maximum likelihood. However, discriminative criteria are sensitive to errors in reference transcriptions, which limits their applicability to unsupervised adaptation. In this paper, we apply the recently proposed differenced maximum mutual information (dMMI) criteria to unsupervised linear regression based adaptation because dMMI has an intrinsic mechanism that mitigates the influence of transcription errors. We report unsupervised adaptation results for a large vocabulary continuous speech recognition task showing a significant improvement over maximum likelihood based linear regression.


Computer Speech & Language | 2016

Differenced maximum mutual information criterion for robust unsupervised acoustic model adaptation

Marc Delcroix; Atsunori Ogawa; Seong-Jun Hahm; Tomohiro Nakatani; Atsushi Nakamura

HighlightsThe differenced-MMI (dMMI) is a discriminative criterion that generalizes MPE and BMMI.We discuss the behavior of dMMI when there are errors in the transcription labels.dMMI may be less sensitive to such errors than other criteria.We support our claim with unsupervised speaker adaptation experiments.dMMI based adaptation achieves significant gains over MLLR for 2 LVCSR tasks. Discriminative criteria have been widely used for training acoustic models for automatic speech recognition (ASR). Many discriminative criteria have been proposed including maximum mutual information (MMI), minimum phone error (MPE), and boosted MMI (BMMI). Discriminative training is known to provide significant performance gains over conventional maximum-likelihood (ML) training. However, as discriminative criteria aim at direct minimization of the classification error, they strongly rely on having accurate reference labels. Errors in the reference labels directly affect the performance. Recently, the differenced MMI (dMMI) criterion has been proposed for generalizing conventional criteria such as BMMI and MPE. dMMI can approach BMMI or MPE if its hyper-parameters are properly set. Moreover, dMMI introduces intermediate criteria that can be interpreted as smoothed versions of BMMI or MPE. These smoothed criteria are robust to errors in the reference labels. In this paper, we demonstrate the effect of dMMI on unsupervised speaker adaptation where the reference labels are estimated from a first recognition pass and thus inevitably contain errors. In particular, we introduce dMMI-based linear regression (dMMI-LR) adaptation and demonstrate significant gains in performance compared with MLLR and BMMI-LR in two large vocabulary lecture recognition tasks.


Archive | 2011

Speech recognition in the presence of highly non-stationary noise based on spatial, spectral and temporal speech/noise modeling combined with dynamic variance adaptation

Marc Delcroix; Keisuke Kinoshita; Tomohiro Nakatani; Shoko Araki; Atsunori Ogawa; Takaaki Hori; Shinji Watanabe; Masakiyo Fujimoto; Takuya Yoshioka; Takanobu Oba; Yotaro Kubo; Mehrez Souden; Seong-Jun Hahm; Atsushi Nakamura


international symposium/conference on music information retrieval | 2011

A System for Evaluating Singing Enthusiasm for Karaoke.

Ryunosuke Daido; Seong-Jun Hahm; Masashi Ito; Shozo Makino; Akinori Ito

Collaboration


Dive into the Seong-Jun Hahm's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Atsushi Nakamura

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marc Delcroix

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Masashi Ito

Tohoku Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Jun Wang

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tomohiro Nakatani

Nippon Telegraph and Telephone

View shared research outputs
Researchain Logo
Decentralizing Knowledge