Mikyong Ji
Information and Communications University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mikyong Ji.
IEEE Signal Processing Letters | 2007
Youngjoo Suh; Mikyong Ji; Hoirin Kim
In this letter, a probabilistic class histogram equalization method is proposed to compensate for an acoustic mismatch in noise robust speech recognition. The proposed method aims not only to compensate for the acoustic mismatch between training and test environments but also to reduce the limitations of the conventional histogram equalization. It utilizes multiple class-specific reference and test cumulative distribution functions, classifies noisy test features into their corresponding classes by means of soft classification with a Gaussian mixture model, and equalizes the features by using their corresponding class-specific distributions. Experiments on the Aurora 2 task confirm the superiority of the proposed approach in acoustic feature compensation
IEEE Transactions on Consumer Electronics | 2008
Mikyong Ji; Sungtak Kim; Hoirin Kim; Ho-Sub Yoon
With the aim of achieving the best possible speaker identification rate in a distant-talking environment, we developed a multiple microphone-based text-independent speaker identification system using soft channel selection. The system selects and combines the identification results based on the reliability of an individual channel result using a single perceptron. Thus, it allows for user-customized service with high identification accuracy in home robot environments. From the experimental results, it is shown that the proposed system is effective in a distant-talking environment, thereby providing a speech interface for a wide range of potential hands-free applications in a ubiquitous environment.
robot and human interactive communication | 2007
Mikyong Ji; Sungtak Kim; Hoirin Kim; Keun-Chang Kwak; Young-Jo Cho
This paper presents a text-independent speaker identification system using multiple microphones on the robot, which is intended for use in human-robot interaction. For the purpose of the best possible classification rate in speaker identification, the individual identification results obtained from multiple microphones on the robot are combined by various combination schemes. The performance improvement has been achieved. Our ultimate goal is to enhance human-robot interaction by improving the recognition performance of speaker identification with multiple microphones on the robot side in adverse distant-talking environments. Various combination schemes obtained high classification accuracy in the ubiquitous robot companion (URC) environment, where the robot is connected to a server through extremely high broadband penetration rate. In conclusion, our speaker identification system can provide human-robot interaction with a reliable basic interface with high classification accuracy.
IEICE Transactions on Information and Systems | 2007
Sungtak Kim; Mikyong Ji; Youngjoo Suh; Hoirin Kim
Recently, many techniques have been proposed to improve speaker identification in noise environments. Among these techniques, we consider the feature recombination technique for the multi-band approach in noise robust speaker identification. The conventional feature recombination technique is very effective in the band-limited noise condition, but in broad-band noise condition, the conventional feature recombination technique does not provide notable performance improvement compared with the full-band system. Even though the speech is corrupted by the broad-band noise, the degree of the noise corruption on each sub-band is different from each other. In the conventional feature recombination for speaker identification, all sub-band features are used to compute multi-band likelihood score, but this likelihood computation does not use a merit of multi-band approach effectively, even though the sub-band features are extracted independently. Here we propose a new technique of sub-band likelihood computation with sub-band weighting in the feature recombination method. The signal to noise ratio (SNR) is used to compute the sub-band weights. The proposed sub-band-weighted likelihood computation makes a speaker identification system more robust to noise. Experimental results show that the average error reduction rate (ERR) in various noise environments is more than 24% compared with the conventional feature recombination-based speaker identification system.
international conference on consumer electronics | 2008
Mikyong Ji; Sungtak Kim; Hoirin Kim; Ho-Sub Yoon
With the aim of improving speaker identification in a multi-microphone environment, we develop a text-independent speaker identification system. It incorporates soft channel selection before the combination of the identification results obtained by multiple microphones. The results demonstrate that the proposed system achieves high classification accuracy, thereby providing a speech interface for a wide range of potential hands-free applications in a ubiquitous environment.
IEICE Transactions on Information and Systems | 2007
Mikyong Ji; Sungtak Kim; Hoirin Kim
With the aim of improving speaker identification, we propose a likelihood-based integration method to combine the speaker identification results obtained through multiple microphones. In many cases, the composite result has lower error rate than that by any single channel. The proposed integration method can achieve more reliable identification performance in the ubiquitous robot companion (URC) environment in which the robot is connected to a server through an extremely high broadband penetration rate.
text speech and dialogue | 2006
Mikyong Ji; Sungtak Kim; Hoirin Kim
This paper describes an integrated system to produce a composite recognition output on distant-talking speech when the recognition results from multiple microphone inputs are available In many cases, the composite recognition result has lower error rate than any other individual output In this work, the composite recognition result is obtained by applying Bayesian inference The log likelihood score is assumed to follow a Gaussian distribution, at least approximately First, the distribution of the likelihood score is estimated in the development set Then, the confidence interval for the likelihood score is used to remove unreliable microphone channels Finally, the area under the distribution between the likelihood score of a hypothesis and that of the (N+1)st hypothesis is obtained for every channel and integrated for all channels by Bayesian inference The proposed system shows considerable performance improvement compared with the result using an ordinary method by the summation of likelihoods as well as any of the recognition results of the channels.
Etri Journal | 2008
Sungtak Kim; Mikyong Ji; Hoirin Kim
The Journal of the Acoustical Society of Korea | 2009
Sungtak Kim; Mikyong Ji; Hoirin Kim
Archive | 2008
Mikyong Ji; Sungtak Kim; Hoirin Kim