Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hoirin Kim is active.

Publication


Featured researches published by Hoirin Kim.


IEEE Signal Processing Letters | 2007

Probabilistic Class Histogram Equalization for Robust Speech Recognition

Youngjoo Suh; Mikyong Ji; Hoirin Kim

In this letter, a probabilistic class histogram equalization method is proposed to compensate for an acoustic mismatch in noise robust speech recognition. The proposed method aims not only to compensate for the acoustic mismatch between training and test environments but also to reduce the limitations of the conventional histogram equalization. It utilizes multiple class-specific reference and test cumulative distribution functions, classifies noisy test features into their corresponding classes by means of soft classification with a Gaussian mixture model, and equalizes the features by using their corresponding class-specific distributions. Experiments on the Aurora 2 task confirm the superiority of the proposed approach in acoustic feature compensation


IEEE Signal Processing Letters | 2012

Multiple Acoustic Model-Based Discriminative Likelihood Ratio Weighting for Voice Activity Detection

Youngjoo Suh; Hoirin Kim

In this letter, we propose a novel statistical voice activity detection (VAD) technique. The proposed technique employs probabilistically derived multiple acoustic models to effectively optimize the weights on frequency domain likelihood ratios with the discriminative training approach for more accurate voice activity detection. Experiments performed on various AURORA noisy environments showed that the proposed approach produces meaningful performance improvements over the single acoustic model-based conventional approaches.


IEEE Transactions on Consumer Electronics | 2008

Text-Independent Speaker Identification using Soft Channel Selection in Home Robot Environments

Mikyong Ji; Sungtak Kim; Hoirin Kim; Ho-Sub Yoon

With the aim of achieving the best possible speaker identification rate in a distant-talking environment, we developed a multiple microphone-based text-independent speaker identification system using soft channel selection. The system selects and combines the identification results based on the reliability of an individual channel result using a single perceptron. Thus, it allows for user-customized service with high identification accuracy in home robot environments. From the experimental results, it is shown that the proposed system is effective in a distant-talking environment, thereby providing a speech interface for a wide range of potential hands-free applications in a ubiquitous environment.


IEEE Transactions on Audio, Speech, and Language Processing | 2015

Automatic intelligibility assessment of dysarthric speech using phonologically-structured sparse linear model

Myung Jong Kim; Younggwan Kim; Hoirin Kim

This paper presents a new method for automatically assessing the speech intelligibility of patients with dysarthria, which is a motor speech disorder impeding the physical production of speech. The proposed method consists of two main steps: feature representation and prediction. In the feature representation step, the speech utterance is converted into a phone sequence using an automatic speech recognition technique and is then aligned with a canonical phone sequence from a pronunciation dictionary using a weighted finite state transducer to capture the pronunciation mappings such as match, substitution, and deletion. The histograms of the pronunciation mappings on a pre-defined word set are used for features. Next, in the prediction step, a structured sparse linear model incorporated with phonological knowledge that simultaneously addresses phonologically structured sparse feature selection and intelligibility prediction is proposed. Evaluation of the proposed method on a database of 109 speakers consisting of 94 dysarthric and 15 control speakers yielded a root mean square error of 8.14 compared to subjectively rated scores in the range of 0 to 100. This is a promising performance in which the system can be successfully applied to help speech therapists in diagnosing the degree of speech disorder.


IEEE Transactions on Multimedia | 2012

Audio-Based Objectionable Content Detection Using Discriminative Transforms of Time-Frequency Dynamics

Myung Jong Kim; Hoirin Kim

In this paper, the problem of detecting objectionable sounds, such as sexual screaming or moaning, to classify and block objectionable multimedia content is addressed. Objectionable sounds show distinctive characteristics, such as large temporal variations and fast spectral transitions, which are different from general audio signals, such as speech and music. To represent these characteristics, segment-based two-dimensional Mel-frequency cepstral coefficients and histograms of gradient directions are used as a feature set to characterize the time-frequency dynamics within a long-range segment of the target signal. After extracting the features, they are transformed to features with lower dimensions while preserving discriminative information using linear discriminant analysis based on a combination of global and local Fisher criteria. A Gaussian mixture model is adopted to statistically represent objectionable and non-objectionable sounds, and test sounds are classified by using a likelihood ratio test. Evaluation of the proposed feature extraction method on a database of several hundred objectionable and non-objectionable sound clips yielded precision/recall breakeven point of 91.25%, which is a promising performance which shows that the system can be applied to help an image-based approach to block such multimedia content.


content based multimedia indexing | 2011

Automatic extraction of pornographic contents using radon transform based audio features

Myung Jong Kim; Hoirin Kim

This paper focuses on the problem of classifying pornographic sounds, such as sexual scream or moan, to detect and block the objectionable multimedia contents. To represent the large temporal variations of pornographic sounds, we propose a novel feature extraction method based on Radon transform. Radon transform provides a way to extract the global trend of orientations in a 2-D region and therefore it is applicable to the time-frequency spectrograms in the long-range segment to capture the large temporal variations of sexual sounds. Radon feature is extracted using histograms and flux of Radon coefficients. We adopt Gaussian mixture model to statistically represent the pornographic and non-pornographic sounds, and the test sounds are classified by using likelihood ratio test. Evaluations on several hundred pornographic and non-pornographic sound clips indicate that the proposed features can achieve satisfactory results that this approach could be used as an alternative to the image-based methods.


EURASIP Journal on Advances in Signal Processing | 2007

Compensating acoustic mismatch using class-based histogram equalization for robust speech recognition

Youngjoo Suh; Sungtak Kim; Hoirin Kim

A new class-based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating for an acoustic mismatch between training and test environments but also reducing the two fundamental limitations of the conventional histogram equalization method, the discrepancy between the phonetic distributions of training and test speech data, and the nonmonotonic transformation caused by the acoustic mismatch. The algorithm employs multiple class-specific reference and test cumulative distribution functions, classifies noisy test features into their corresponding classes, and equalizes the features by using their corresponding class reference and test distributions. The minimum mean-square error log-spectral amplitude (MMSE-LSA)-based speech enhancement is added just prior to the baseline feature extraction to reduce the corruption by additive noise. The experiments on the Aurora2 database proved the effectiveness of the proposed method by reducing relative errors by over the mel-cepstral-based features and by over the conventional histogram equalization method, respectively.


international conference on acoustics, speech, and signal processing | 2016

Cross-acoustic transfer learning for sound event classification

Hyungjun Lim; Myung Jong Kim; Hoirin Kim

A well-trained acoustic model that effectively captures the characteristics of sound events is a critical factor to develop more reliable system for sound event classification. Deep neural network (DNN) which has an ability to extract discriminative representation of features can be a good candidate for acoustic model of sound events. Compared to other data such as speech or image, the amount of sound database is often insufficient for learning the DNN properly, resulting in overfitting problems. In this paper, we propose a cross-acoustic transfer learning framework that can effectively train the DNN even with insufficient sound data by employing rich speech data. Three datasets are used to evaluate our proposed method; one sound dataset is from Real World Computing Partnership (RWCP) DB and two speech datasets are from Resource Management (RM) and Wall Street Journal (WSJ) DBs. A series of experimental results verify that cross-acoustic transfer learning performs significantly better than the baseline DNN which was trained only from sound data, achieving 26.24% relative classification error rate (CER) improvement over the DNN baseline system.


conference of the international speech communication association | 2016

Dysarthric speech recognition using Kullback-Leibler divergence-based hidden Markov model

Myung Jong Kim; Jun Wang; Hoirin Kim

Dysarthria is a neuro-motor speech disorder that impedes the physical production of speech. Patients with dysarthria often have trouble in pronouncing certain sounds, resulting in undesirable phonetic variation. Current automatic speech recognition systems designed for the general public are ineffective for dysarthric sufferers due to the phonetic variation. In this paper, we investigate dysarthric speech recognition using Kullback-Leibler divergence-based hidden Markov models. In the model, the emission probability of state is modeled by a categorical distribution using phoneme posterior probabilities from a deep neural network, and therefore, it can effectively capture the phonetic variation of dysarthric speech. Experimental evaluation on a database of several hundred words uttered by 30 speakers consisting of 12 mildly dysarthric, 8 moderately dysarthric, and 10 control speakers showed that our approach provides substantial improvement over the conventional Gaussian mixture model and deep neural network based speech recognition systems.


international conference on computers helping people with special needs | 2012

Automatic assessment of dysarthric speech intelligibility based on selected phonetic quality features

Myung Jong Kim; Hoirin Kim

This paper addresses the problem of assessing the speech intelligibility of patients with dysarthria, which is a motor speech disorder. Dysarthric speech produces spectral distortion caused by poor articulation. To characterize the distorted spectral information, several features related to phonetic quality are extracted. Then, we find the best feature set which not only produces a small prediction error but also keeps their mutual dependency low. Finally, the selected features are linearly combined using a multiple regression model. Evaluation of the proposed method on a database of 94 patients with dysarthria proves the effectiveness in predicting subjectively rated scores.

Collaboration


Dive into the Hoirin Kim's collaboration.

Top Co-Authors

Avatar

Sungtak Kim

Information and Communications University

View shared research outputs
Top Co-Authors

Avatar

Mikyong Ji

Information and Communications University

View shared research outputs
Top Co-Authors

Avatar

Mansoo Park

Information and Communications University

View shared research outputs
Top Co-Authors

Avatar

Suk-Bong Kwon

Information and Communications University

View shared research outputs
Researchain Logo
Decentralizing Knowledge