Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Biing-Hwang Juang is active.

Publication


Featured researches published by Biing-Hwang Juang.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction

Tomohiro Nakatani; Takuya Yoshioka; Keisuke Kinoshita; Masato Miyoshi; Biing-Hwang Juang

This paper proposes a statistical model-based speech dereverberation approach that can cancel the late reverberation of a reverberant speech signal captured by distant microphones without prior knowledge of the room impulse responses. With this approach, the generative model of the captured signal is composed of a source process, which is assumed to be a Gaussian process with a time-varying variance, and an observation process modeled by a delayed linear prediction (DLP). The optimization objective for the dereverberation problem is derived to be the sum of the squared prediction errors normalized by the source variances; hence, this approach is referred to as variance-normalized delayed linear prediction (NDLP). Inheriting the characteristic of DLP, NDLP can robustly estimate an inverse system for late reverberation in the presence of noise without greatly distorting a direct speech signal. In addition, owing to the use of variance normalization, NDLP allows us to improve the dereverberation result especially with relatively short (of the order of a few seconds) observations. Furthermore, NDLP can be implemented in a computationally efficient manner in the time-frequency domain. Experimental results demonstrate the effectiveness and efficiency of the proposed approach in comparison with two existing approaches.


international conference on acoustics, speech, and signal processing | 2008

Blind speech dereverberation with multi-channel linear prediction based on short time fourier transform representation

Tomohiro Nakatani; Takuya Yoshioka; Keisuke Kinoshita; Masato Miyoshi; Biing-Hwang Juang

It has recently been shown that the use of the time-varying nature of speech signals allows us to achieve high quality speech dereverberation based on multi-channel linear prediction (MCLP). However, this approach requires a huge computing cost for calculating large covariance matrices in the time domain. In addition, we face the important problem of how to combine the speech dereverberation efficiently with many other useful speech enhancement techniques in the short time Fourier transform (STFT) domain. As the first step to overcoming these problems, this paper presents methods for implementing MCLP based speech dereverberation that allow it to work in the STFT domain with much less computing cost. The effectiveness of the present methods is confirmed by experiments in terms of the recovered signal quality and the computing time.


IEEE Transactions on Multimedia | 2013

Feature Processing and Modeling for 6D Motion Gesture Recognition

Mingyu Chen; Ghassan AlRegib; Biing-Hwang Juang

A 6D motion gesture is represented by a 3D spatial trajectory and augmented by another three dimensions of orientation. Using different tracking technologies, the motion can be tracked explicitly with the position and orientation or implicitly with the acceleration and angular speed. In this work, we address the problem of motion gesture recognition for command-and-control applications. Our main contribution is to investigate the relative effectiveness of various feature dimensions for motion gesture recognition in both user-dependent and user-independent cases. We introduce a statistical feature-based classifier as the baseline and propose an HMM-based recognizer, which offers more flexibility in feature selection and achieves better performance in recognition accuracy than the baseline system. Our motion gesture database which contains both explicit and implicit motion information allows us to compare the recognition performance of different tracking signals on a common ground. This study also gives an insight into the attainable recognition rate with different tracking devices, which is valuable for the system designer to choose the proper tracking technology.


workshop on applications of signal processing to audio and acoustics | 2007

On Dealing with Sampling Rate Mismatches in Blind Source Separation and Acoustic Echo Cancellation

Enrique Robledo-Arnuncio; Ted S. Wada; Biing-Hwang Juang

The lack of a common clock reference is a fundamental problem when dealing with audio streams originating from or heading to different distributed sound capture or playback devices. When implementing multichannel signal processing algorithms for such kind of audio streams it is necessary to account for the unavoidable mismatches between the actual sampling rates. There are some approaches that can help to correct these mismatches, but important problems remain to be solved, among them the accurate estimation of the mismatch factors, and achieving both accuracy and computational efficiency in their correction. In this paper we present an empirical study on the performance of blind source separation and acoustic echo cancellation algorithms in this scenario. We also analyze the degradation in performance when using an approximate but efficient method to correct the rate mismatches.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model

Tomohiro Nakatani; Biing-Hwang Juang; Takuya Yoshioka; Keisuke Kinoshita; Marc Delcroix; Masato Miyoshi

Distant acquisition of acoustic signals in an enclosed space often produces reverberant components due to acoustic reflections in the room. Speech dereverberation is in general desirable when the signal is acquired through distant microphones in such applications as hands-free speech recognition, teleconferencing, and meeting recording. This paper proposes a new speech dereverberation approach based on a statistical speech model. A time-varying Gaussian source model (TVGSM) is introduced as a model that represents the dynamic short time characteristics of nonreverberant speech segments, including the time and frequency structures of the speech spectrum. With this model, dereverberation of the speech signal is formulated as a maximum-likelihood (ML) problem based on multichannel linear prediction, in which the speech signal is recovered by transforming the observed signal into one that is probabilistically more like nonreverberant speech. We first present a general ML solution based on TVGSM, and derive several dereverberation algorithms based on various source models. Specifically, we present a source model consisting of a finite number of states, each of which is manifested by a short time speech spectrum, defined by a corresponding autocorrelation (AC) vector. The dereverberation algorithm based on this model involves a finite collection of spectral patterns that form a codebook. We confirm experimentally that both the time and frequency characteristics represented in the source models are very important for speech dereverberation, and that the prior knowledge represented by the codebook allows us to further improve the dereverberated speech quality. We also confirm that the quality of reverberant speech signals can be greatly improved in terms of the spectral shape and energy time-pattern distortions from simply a short speech signal using a speaker-independent codebook.


Proceedings of the 3rd Multimedia Systems Conference on | 2012

6DMG: a new 6D motion gesture database

Mingyu Chen; Ghassan AlRegib; Biing-Hwang Juang

Motion-based control is gaining popularity, and motion gestures form a complementary modality in human-computer interactions. To achieve more robust user-independent motion gesture recognition in a manner analogous to automatic speech recognition, we need a deeper understanding of the motions in gesture, which arouses the need for a 6D motion gesture database. In this work, we present a database that contains comprehensive motion data, including the position, orientation, acceleration, and angular speed, for a set of common motion gestures performed by different users. We hope this motion gesture database can be a useful platform for researchers and developers to build their recognition algorithms as well as a common test bench for performance comparisons.


workshop on applications of signal processing to audio and acoustics | 2009

Acoustic echo cancellation based on independent component analysis and integrated residual echo enhancement

Ted S. Wada; Biing-Hwang Juang

This paper examines the technique of using a memoryless noise-suppressing nonlinearity in the adaptive filter error feedback-loop of an acoustic echo canceler (AEC) based on normalized least-mean square (NLMS) when there is an additive noise at the near-end. It will be shown that introducing the nonlinearity to “enhance” the filter estimation error is well-founded in the information-theoretic sense and has a deep connection to the independent component analysis (ICA). The paradigm of AEC as a problem that can be approached by ICA leads to new algorithmic possibilities beyond the conventional LMS family of techniques. In particular, a right combination of the error enhancement procedure and a properly implemented regularization procedure enables the AEC to be performed recursively and continuously in the frequency domain when there are both ambient noise and double-talk even without the double-talk detection (DTD) or the voice activity detection (VAD) procedure.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Enhancement of Residual Echo for Robust Acoustic Echo Cancellation

Ted S. Wada; Biing-Hwang Juang

This paper examines the technique of using a noise-suppressing nonlinearity in the adaptive filter error feedback-loop of an acoustic echo canceler (AEC) based on the least mean square (LMS) algorithm when there is an interference at the near end. The source of distortion may be linear, such as local speech or background noise, or nonlinear due to speech coding used in the telecommunication networks. Detailed derivation of the error recovery nonlinearity (ERN), which “enhances” the filter estimation error prior to the adaptation in order to assist the linear adaptation process, will be provided. Connections to other existing AEC and signal enhancement techniques will be revealed. In particular, the error enhancement technique is well-founded in the information-theoretic sense and has strong ties to independent component analysis (ICA), which is the basis for blind source separation (BSS) that permits unsupervised adaptation in the presence of multiple interfering signals. The single-channel AEC problem can be viewed as a special case of semi-blind source separation (SBSS) where one of the source signals is partially known, i.e., the far-end microphone signal that generates the near-end acoustic echo. The system approach to robust AEC will be motivated, where a proper integration of the LMS algorithm with the ERN into the AEC “system” allows for continuous and stable adaptation even during double talk without precise estimation of the signal statistics. The error enhancement paradigm encompasses many traditional signal enhancement techniques and opens up an entirely new avenue for solving the AEC problem in a real-world setting.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Batch-Online Semi-Blind Source Separation Applied to Multi-Channel Acoustic Echo Cancellation

Francesco Nesta; Ted S. Wada; Biing-Hwang Juang

Semi-blind source separation (SBSS) is a special case of the well-known blind source separation (BSS) when some partial knowledge of the source signals is available to the system. In particular, a batch adaptation in the frequency domain based on independent component analysis (ICA) can be effectively used to jointly perform source separation and multichannel acoustic echo cancellation (MCAEC) through SBSS without double-talk detection. Many issues related to the implementation of an SBSS system are discussed in this paper. After a deep analysis of the structure of the SBSS adaptation, we propose a constrained batch-online implementation that stabilizes the convergence behavior even in the worst case scenario of a single far-end talker along with the non-uniqueness condition on the far-end mixing system. Specifically, a matrix constraint is proposed to reduce the effect of the non-uniqueness problem caused by highly correlated far-end reference signals during MCAEC. Experimental results show that high echo cancellation can be achieved just as the misalignment remains relatively low without any preprocessing procedure to decorrelate the far-end signals even for the single far-end talker case.


IEEE Transactions on Image Processing | 2009

Subjective Evaluation of Spatial Resolution and Quantization Noise Tradeoffs

Soo Hyun Bae; Thrasyvoulos N. Pappas; Biing-Hwang Juang

Most full-reference fidelity/quality metrics compare the original image to a distorted image at the same resolution assuming a fixed viewing condition. However, in many applications, such as video streaming, due to the diversity of channel capacities and display devices, the viewing distance and the spatiotemporal resolution of the displayed signal may be adapted in order to optimize the perceived signal quality. For example, at low bitrate coding applications an observer may prefer to reduce the resolution or increase the viewing distance to reduce the visibility of the compression artifacts. The tradeoff between resolution/viewing conditions and visibility of compression artifacts requires new approaches for the evaluation of image quality that account for both image distortions and image size. In order to better understand such tradeoffs, we conducted subjective tests using two representative still image coders, JPEG and JPEG 2000. Our results indicate that an observer would indeed prefer a lower spatial resolution (at a fixed viewing distance) in order to reduce the visibility of the compression artifacts, but not all the way to the point where the artifacts are completely invisible. Moreover, the observer is willing to accept more artifacts as the image size decreases. The subjective test results we report can be used to select viewing conditions for coding applications. They also set the stage for the development of novel fidelity metrics. The focus of this paper is on still images, but it is expected that similar tradeoffs apply to video.

Collaboration


Dive into the Biing-Hwang Juang's collaboration.

Top Co-Authors

Avatar

Ted S. Wada

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Soo Hyun Bae

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Ghassan AlRegib

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Mingyu Chen

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Yong Zhao

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Keisuke Kinoshita

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tomohiro Nakatani

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Jason Wung

Georgia Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge