Tuan Van Pham | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tuan Van Pham is active.

Explore More

Publication

Featured researches published by Tuan Van Pham.

2009 IEEE-RIVF International Conference on Computing and Communication Technologies | 2009

Using Artificial Neural Network for Robust Voice Activity Detection Under Adverse Conditions

Tuan Van Pham; Chien T. Tang; Michael Stadtschnitzer

We present an approach to model-based voice activity detection (VAD) for harsh environments. By using mel-frequency cepstral coefficients feature extracted from clean and noisy speech samples, an artificial neural network is trained optimally in order to provide a reliable model. There are three main aspects to this study: First, in addition to the developed model, recent state-of-the-art VAD methods are analyzed extensively. Second, we present an optimization procedure of neural network training, including evaluation of trained network performance with proper measures. Third, a large assortment of empirical results on the noisy TIMIT and SNOW corpuses including different types of noise at different signal-to-noise ratios is provided. We evaluate the built VAD model on the noisy corpuses and compare against the state-of-the-art VAD methods such as the ITU-T Rec. G. 729 Annex B, the ETSI AFE ES 202 050, and recently promising VAD algorithms. Results show that: (i) the proposed neural network classifier employing MFCC feature provides robustly high scores under different noisy conditions; (ii) the invented model is superior to other VAD methods in terms of various classification measures; (iii) the robustness of the developed VAD algorithm is still hold in the case of testing it with the completely mismatched environment.

Speaker Classification II | 2007

Speaker Segmentation for Air Traffic Control

Michael Neffe; Tuan Van Pham; Horst Hering; Gernot Kubin

In this contribution a novel system of speaker segmentation has been designed for improving safety on voice communication in air traffic control. In addition to the usage of the aircraft identification tag to assign speaker turns on the shared communication channel to aircrafts, speaker verification is investigated as an add-on attribute to improve security level effectively for the air traffic control. The verification task is done by training universal background models and speaker dependent models based on Gaussian mixture model approach. The feature extraction and normalization units are especially optimized to deal with small bandwidth restrictions and very short speaker turns. To enhance the robustness of the verification system, a cross verification unit is further applied. The designed system is tested with SPEECHDAT-AT and WSJ0 database to demonstrate its superior performance.

international conference on acoustics, speech, and signal processing | 2006

Noise Suppression Based Onwavelet Packet Decomposition and Quantile Noise Estimation for Robust Automatic Speech Recognition

Erhard Rank; Tuan Van Pham; Gernot Kubin

In this paper we address the application of a denoising algorithm based on wavelet packet decomposition and quantile noise estimation to noise suppression for automatic speech recognition. The denoising algorithm is adapted to suit the different requirements in machine recognition, as compared to human perception, and is tested in combination with state-of-the-art speech recognition systems. The results show, that, if the proposed algorithm is integrated with the recognition system - including the training process - a performance comparable to recent high-quality noise suppression methods is achieved

international conference on communications | 2010

A novel implementation of the spectral shaping approach for artificial bandwidth extension

Tuan Van Pham; Friedrich Schaefer; Gernot Kubin

We present a novel implementation of the spectral shaping approach for artificial bandwidth extension (ABE). The spectral envelope in the missing band is adaptively tuned by a set of controlled subband powers predicted by a feedforward neural network (FFNN) in conjunction with the use of spline interpolation. There are three main aspects to this study: First, objective quality measures are exploited to select a proper feature vector, to choose relevant critical subbands and to train the network effectively. Second, the extensions toward high and low frequencies are evaluated explicitly. Third, a large assortment of empirical results on speech quality and speech intelligibility is provided. The obtained results indicate a significant improvement of the extended speech w.r.t. the narrowband speech.

international conference on communications | 2008

Robust speech recognition using adaptive noise threshold estimation and wavelet shrinkage

Tuan Van Pham; Gernot Kubin; Erhard Rank

We propose an improved noise reduction method for robust speech recognition based on a perceptually statistical wavelet filtering algorithm. Perceptual noise thresholds are estimated from the universal thresholds for each critical wavelet subband. Fast changes of background noise are tracked adaptively by improving our statistical percentile filtering method. Smoothed wavelet shrinkage is applied to enhance noisy wavelet coefficients. Performance of the proposed denoising algorithm is evaluated in terms of recognition performance under adverse noisy conditions such as car and factory environments. Furthermore, it is compared to recent speech enhancement methods embedded in different state-of-the-art speech recognizers. Overall results indicate that almost similar recognition performance is obtained on the AURORA3 SPEECHDAT-Car corpus as compared to the HTK recognizer using the advanced front-end while there is an improvement when testing with the Loquendo recognizer on the SNOW-Factory corpus.

multimedia signal processing | 2006

Audio-Visual Feature Extraction for Semi-Automatic Annotation of Meetings

Marian Kepesi; Michael Neffe; Tuan Van Pham; Michael Grabner; Helmut Grabner; Andreas Juffinger

In this paper we present the building blocks of our semi-automatic annotation tool which supports multi-modal and multi-level annotation of meetings. The main focus is on the proper design and functionality of the modules for recognizing meeting actions. The key features, identity and position of the speakers, are provided by different modalities (audio and video). Three audio algorithms (voice activity detection, speaker identification and direction of arrival) and three video algorithms (detection, tracking and identification) form the low-level feature extraction components. Low-level features are automatically merged and the recognized actions are proposed to the user by visualizing them. The annotation labels are related but not limited to events during meetings. The user can finally confirm or if necessary, modify the suggestion, and then store the actions into a database

conference of the international speech communication association | 2004