Is this you? Create Your Porfile

Hoon Chung

Electronics and Telecommunications Research Institute

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hoon Chung is active.

Explore More

Publication

Featured researches published by Hoon Chung.

IEEE Transactions on Consumer Electronics | 2008

Fast speech recognition to access a very large list of items on embedded devices

Hoon Chung; Jeon Gue Park; Yun Keun Lee; Ikjoo Chung

In this paper, we propose a fast decoding algorithm to recognize a very large number of item names on a resource-limited embedded device. The proposed algorithm is based on a multi-pass search scheme. The algorithm is composed of a two-stage HMM-based coarse match and a detailed match. The two-stage HMM-based coarse match is aimed at rapidly selecting a small set of candidates that are assumed to contain a correct hypothesis with high probability, and the detailed match re-ranks the candidates by performing acoustic rescoring. The proposed algorithm is implemented on an in-car navigation system with a 32-bit fixed-point processor operating at 620 MHz. The experimental result shows that the proposed method runs at maximum speed 1. 74 times real-time on the embedded device while minimizing the degradation of the recognition accuracy for a 220 K Korean Point-of-Interest (POI) recognition domain.

international symposium on neural networks | 2016

Deep neural network using trainable activation functions.

Hoon Chung; Sung Joo Lee; Jeon Gue Park

This paper proposes trainable activation functions for deep neural network (DNN). A DNN is a feed-forward neural network composed of more than one hidden nonlinear layer. It is characterized by a set of weight matrices, bias vectors, and a nonlinear activation function. In model parameter training, weight matrices and bias vectors are updated using an error back-propagation algorithm but activation functions is not. It is just fixed empirically. Many rectifier-type nonlinear functions have been proposed as activation functions, but the best nonlinear functions for any given task domain remain unknown. In order to address the issue, we propose a trainable activation function. In the proposed approach, conventional nonlinear activation functions were approximated for a Taylor series, and the coefficients were retrained simultaneously with other parameters. The effectiveness of the proposed approach was evaluated for MNIST handwritten digit recognition domain.

spoken language technology workshop | 2016

Deep neural network based acoustic model parameter reduction using manifold regularized low rank matrix factorization

Hoon Chung; Jeom Ja Kang; Ki Young Park; Sung Joo Lee; Jeon Gue Park

In this paper, we propose a deep neural network (DNN) model parameter reduction based on manifold regularized low rank matrix factorization to reduce the computational complexity of acoustic model for low resource embedded devices. One of the most common DNN model parameter reduction techniques is truncated singular value decomposition (TSVD). TSVD reduces the number of parameters by approximating a target matrix with a low rank one in terms of minimizing the Euclidean norm. In this work, we questioned whether the Euclidean norm is appropriate as objective function to factorize DNN matrices because DNN is known to learn nonlinear manifold of acoustic features. Therefore, in order to exploit the manifold structure for robust parameter reduction, we propose manifold regularized matrix factorization approach. The proposed method was evaluated on TIMIT phone recognition domain.

international symposium on neural networks | 2017

Phonetic state relation graph regularized deep neural network for robust acoustic model

Hoon Chung; Yoo Rhee Oh; Sung Joo Lee; Jeon Gue Park

In this paper, we propose a phonetic state relation graph regularized Deep Neural Network (DNN) for a robust acoustic model. A DNN-based acoustic model is trained in terms of minimizing a cost function that is usually penalized by regularizations. Regularization generally reflects prior knowledge that plays a role in constraining the model parameter space. In DNN-based acoustic models, various regularizations have been proposed to improve robustness. However, most approaches do not handle speech generation knowledge even if this process is the most fundamental prior. For example, l1 and l2-norm regularizations are equivalent to set Gaussian prior and Laplacian prior to model parameters respectively. This means that any speech signal specific knowledge is not used for regularization. Manifold-based regularization exploits the local linear structure of observed acoustic features, which are simply realization of the speech generation process. Therefore, to incorporate prior knowledge of speech generation into regularization, we propose a phonetic state relation graph based approach. This method was evaluated on the TIMIT phone recognition domain. The results showed that it reduced the phone error rate from 20.8% to 20.3% under the same conditions.

ieee international conference on computer communication and internet | 2016

I-vector based utterance verification for large-vocabulary speech recognition system

Woo Yong Choi; Hwa Jeon Song; Hoon Chung; Jeomja Kang; Jeon Gue Park

This paper proposes a new Utterance Verification (UV) algorithm based on i-vector. Phone segments are extracted and concatenated from the training data, which are used to train the Universal Background Model (UBM) and the Total Variability (TV) matrix, and then, i-vector is extracted from the enrollment and evaluation data using UBM and TV matrix. We compare two Confidence Measures (CMs), cosine distance scoring and Support Vector Machine (SVM). To compensate the channel effect, we use two channel compensation methods, Linear Discriminant Analysis (LDA) and Within-Class Covariance Normalization (WCCN). The decision is made by the word-level CM by combining the phone-level CMs. Experiments are conducted in the Korean isolated word recognition domain. Experimental results show that SVM is superior to cosine distance scoring. Best performance is achieved when SVM is used without any channel compensation method.

european signal processing conference | 2015

A useful feature-engineering approach for a LVCSR system based on CD-DNN-HMM algorithm

Sung Joo Lee; Byung Ok Kang; Hoon Chung; Jeon Gue Park

In this paper, we propose a useful feature-engineering approach for Context-Dependent Deep-Neural-Network Hidden-Markov-Model (CD-DNN-HMM) based Large-Vocabulary-Continuous-Speech-Recognition (LVCSR) systems. The speech recognition performance of a LVCSR system is improved from two feature-engineering perspectives. The first performance improvement is achieved by adopting the intra/inter-frame feature subsets when the Gaussian-Mixture-Model (GMM) HMMs for the HMM state-level alignment are built. And the second performance gain is then followed with the additional features augmenting the front-end of the DNN. We evaluate the effectiveness of our feature-engineering approach under a series of Korean speech recognition tasks (isolated single-syllable recognition with a medium-sized speech corpus and conversational speech recognition with a large-sized database) using the Kaldi speech recognition toolkit. The results show that the proposed feature-engineering approach outperforms the traditional Mel Frequency Cepstral Coefficient (MFCCs) GMM + Mel-frequency filter-bank output DNN method.

Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice | 2014

Frame-Level Selective Decoding Using Native and Non-native Acoustic Models for Robust Speech Recognition to Native and Non-native Speech

Yoo Rhee Oh; Hoon Chung; Jeomja Kang; Yun Keun Lee

This paper proposes a frame-level selective-decoding method by using both native acoustic models (AMs) and non-native AMs in order to construct a robust speech recognition system for non-native speech as well as native speech. To this end, we use two kinds of well-trained AMs: (a) AMs trained with a large amount of native speech (native AMs) and (b) AMs trained with a plenty amount of non-native speech (non-native AMs). First, each speech feature vector is decoded using native AMs and non-native AMs in parallel. And, we select proper AMs by comparing the likelihoods of the two AMs. Then, the next M frames of speech feature vectors are decoded by using the selected AMs, where M is a pre-defined parameter. The selection and the decoding procedures are repeated until an end of an utterance is encountered. From automatic speech recognition (ASR) experiments for English spoken by Korean speakers, it is shown that an ASR system employing the proposed method reduces an average word error rate (WER) by 16.6% and 41.3% for English spoken by Koreans and native English, respectively, when compared to an ASR system employing an utterance-level selective-decoding method.

Archive | 2008