Jeong-Sik Park | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jeong-Sik Park is active.

Explore More

Publication

Featured researches published by Jeong-Sik Park.

IEEE Transactions on Consumer Electronics | 2013

Acoustic interference cancellation for a voice-driven interface in smart TVs

Jeong-Sik Park; Gil-Jin Jang; Ji-Hwan Kim; Sang Hoon Kim

A novel method is proposed to improve the voice recognition performance by suppressing acoustic interferences that add nonlinear distortion to a target recording signal when received by the recognition device. The proposed method is expected to provide the best performance in smart TV environments, where a remote control collects command speech by the internal microphone and performs automatic voice recognition, and the secondary microphone equipped in a TV set provides the reference signal for the background noise source. Due to the transmission channel, the original interference is corrupted nonlinearly, and the conventional speech enhancement techniques such as beamforming and blind signal separation are not applicable. The proposed method first equalizes the interference in the two microphones by maximizing the instantaneous correlation between the nonlinearly related target recording and reference signal, and suppresses the equalized interference. To obtain an optimal estimation of the equalization filter, a method for detecting instantaneous activity of interference is also proposed. The validity of the proposed method is proved by the improvement in automatic voice recognition performance in a simulated TV room where loud TV sounds or babbling speech interfere in a users commanding speech.

Cognitive Computation | 2012

Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation

Jaebok Kim; Jeong-Sik Park; Yung-Hwan Oh

This paper proposes a novel speech emotion recognition (SER) framework for affective interaction between human and personal devices. Most of the conventional SER techniques adopt a speaker-independent model framework because of the sparseness of individual speech data. However, a large amount of individual data can be accumulated on a personal device, making it possible to construct speaker-characterized emotion models in accordance with a speaker adaptation procedure. In this study, to address problems associated with conventional adaptation approaches in SER tasks, we modified a representative adaptation technique, maximum likelihood linear regression (MLLR), on the basis of selective label refinement. We subsequently carried out the modified MLLR procedure in an online and iterative manner, using accumulated individual data, to further enhance the speaker-characterized emotion models. In the SER experiments based on an emotional corpus, our approach exhibited performance superior to that of conventional adaptation techniques as well as the speaker-independent model framework.

IEEE Transactions on Consumer Electronics | 2012

Multistage utterance verification for keyword recognition-based online spoken content retrieval

Jeong-Sik Park; Gil-Jin Jang; Ji-Hwan Kim

This paper proposes a multistage utterance verification method as a post-processing technique for online spoken content retrieval in portable electric devices. The online spoken content retrieval system analyzes spoken content in an online manner and searches speech segments of pre-defined keywords. To maintain stable performance, we propose a reliable post-processing technique that verifies whether a found utterance or a candidate keyword segment can ultimately be categorized as a keyword. The proposed method involves a two-stage procedure for utterance verification. The first stage utilizes a confidence measure based on N-best log-likelihood recognition results. In the second stage, Dynamic Time Warping (DTW) algorithm is applied to obtain a verification result. As neither of these procedures requires high computational time and intensity, both are very suitable to online retrieval in portable devices such as smartphones. To assess the proposed technique, experiments on multimedia content retrieval tasks were performed using spoken broadcast news data. The evaluation results revealed that the performance of the proposed method was superior to that of the conventional approach.

Eurasip Journal on Audio, Speech, and Music Processing | 2012

Music-aided affective interaction between human and service robot

Jeong-Sik Park; Gil-Jin Jang; Yong-Ho Seo

This study proposes a music-aided framework for affective interaction of service robots with humans. The framework consists of three systems, respectively, for perception, memory, and expression on the basis of the human brain mechanism. We propose a novel approach to identify human emotions in the perception system. The conventional approaches use speech and facial expressions as representative bimodal indicators for emotion recognition. But, our approach uses the mood of music as a supplementary indicator to more correctly determine emotions along with speech and facial expressions. For multimodal emotion recognition, we propose an effective decision criterion using records of bimodal recognition results relevant to the musical mood. The memory and expression systems also utilize musical data to provide natural and affective reactions to human emotions. For evaluation of our approach, we simulated the proposed human-robot interaction with a service robot, iRobiQ. Our perception system exhibited superior performance over the conventional approach, and most human participants noted favorable reactions toward the music-aided affective interaction.

Engineering Applications of Artificial Intelligence | 2016

Multistage data selection-based unsupervised speaker adaptation for personalized speech emotion recognition

Jaebok Kim; Jeong-Sik Park

This paper proposes an efficient speech emotion recognition (SER) approach that utilizes personal voice data accumulated on personal devices. A representative weakness of conventional SER systems is the user-dependent performance induced by the speaker independent (SI) acoustic model framework. But, handheld communications devices such as smartphones provide a collection of individual voice data, thus providing suitable conditions for personalized SER that is more enhanced than the SI model framework. By taking advantage of personal devices, we propose an efficient personalized SER scheme employing maximum likelihood linear regression (MLLR), a representative speaker adaptation technique. To further advance the conventional MLLR technique for SER tasks, the proposed approach selects useful data that convey emotionally discriminative acoustic characteristics and uses only those data for adaptation. For reliable data selection, we conduct multistage selection using a log-likelihood distance-based measure and a universal background model. On SER experiments based on a Linguistic Data Consortium emotional speech corpus, our approach exhibited superior performance when compared to conventional adaptation techniques as well as the SI model framework. Graphical abstractDisplay Omitted

The 2012 International Conference on u- and e- Service, Science and Technology | 2012

Voice Command Recognition for Fighter Pilots Using Grammar Tree

Han-Gyu Kim; Jeong-Sik Park; Yung-Hwan Oh; Seongwoo Kim; Bonggyu Kim

This research copes with the voice command recognizer for fighter pilots. The voice command is composed of several connected words in the fighter system. And the recognizer automatically separates the command into individual words and implements isolated word recognition for each word. To improve the performance of the command recognizer, the error correction using grammar tree is proposed. The isolated word recognition error is corrected in the error correction process. Our experimental result shows that the grammar tree significantly improved the performance of the command recognizer.

Telecommunication Systems | 2015

Emotional information processing based on feature vector enhancement and selection for human---computer interaction via speech

Jeong-Sik Park; Ji-Hwan Kim

This paper proposes techniques for enhancement and selection of emotional feature vectors to correctly process emotional information from users’ spoken data. In real-world devices, speech signals may contain emotional information that is distorted or anomalous owing to environmental noises and the acoustic similarities between emotions. To correctly enhance harmonics of the noise-contaminated speech and thereby utilize them as emotional features, we propose a modified adaptive comb filter, in which the frequency response of the conventional comb filter is re-estimated on the basis of speech presence probability. In addition, to eliminate acoustically anomalous emotional data, we propose a feature vector classification scheme. In this approach, emotional feature vectors are categorized as either discriminative or indiscriminative in an iterative manner, and then only the discriminative vectors are selected for emotional information processing. In emotion recognition experiments using noise-contaminated emotional speech data, our approach exhibited superior performance over the conventional approaches.

Engineering Applications of Artificial Intelligence | 2015

Unsupervised rapid speaker adaptation based on selective eigenvoice merging for user-specific voice interaction

Dong-Jin Choi; Jeong-Sik Park; Yung-Hwan Oh

Speaker adaptation transforms the standard speaker-independent acoustic models into an adapted model relevant to the user (called the target speaker) in order to provide reliable speech recognition performance. Although several conventional adaptation techniques, such as Maximum Likelihood Linear Regression (MLLR) and Maximum A Posteriori (MAP), have been successfully applied to speech recognition tasks, they demonstrate great dependency on the amount of adaptation data. However, the eigenvoice-based adaptation technique is known to provide reliable performance regardless of the amount of data, even for a very small amount. In this study, we propose an efficient eigenvoice adaptation approach to construct more reliable adapted models. The proposed approach merges eigenvoice sets for possible eigenvoice combinations, and then selects optimal eigenvoice sets that are most relevant to the target speaker. For this task, we propose an efficient unsupervised eigenvoice selection method as well as a rapid merging technique. On speech recognition experiments using the Defense Advanced Research Projects Agency?s Resource Management corpus, the proposed approach exhibited superior performance, compared to conventional methods, in both recognition accuracy and time complexity.

International Journal of Distributed Sensor Networks | 2014

Autonomous Position Estimation of a Mobile Node Based on Landmark and Localization Sensor

Se-Jun Park; Jeong-Sik Park; Yong-Ho Seo; Tae-Kyu Yang

This study proposes an efficient position estimation method for localizing a mobile node in indoor environment. Although several conventional methods have been successfully applied for position estimation, they have some drawbacks such as low extendibility in an indoor space, intensive computation, and estimation errors. We propose a precise estimation approach based on a localization sensor and artificial landmarks. In our approach, a mobile node autonomously measures the location of landmarks attached to the ceiling with a localization sensor while moving across the landmarks and building a landmark map. And then, the node estimates its location under the ceiling using the map. In this process, we use a landmark histogram and a Kalman filter to reduce estimation errors. Several experiments performed using a mobile robot successfully demonstrated the feasibility of our proposed approach.

The Journal of the Institute of Webcasting, Internet and Telecommunication | 2013

Performance Improvement of Human Detection in Thermal Images using Principal Component Analysis and Blob Clustering

Ahra Jo; Jeong-Sik Park; Yong-Ho Seo; Gil-Jin Jang

In this paper, we propose a human detection technique using thermal imaging camera. The proposed method is useful at night or rainy weather where a visible light imaging cameras is not able to detect human activities. Under the observation that a human is usually brighter than the background in the thermal images, we estimate the preliminary human regions using the statistical confidence measures in the gray-level, brightness histogram. Afterwards, we applied Gaussian filtering and blob labeling techniques to remove the unwanted noise, and gather the scattered of the pixel distributions and the center of gravities of the blobs. In the final step, we exploit the aspect ratio and the area on the unified object region as well as a number of the principal components extracted from the object region images to determine if the detected object is a human. The experimental results show that the proposed method is effective in environments where visible light cameras are not applicable.

Explore More