Børge Lindberg
Aalborg University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Børge Lindberg.
Archive | 2008
Zheng-Hua Tan; Børge Lindberg
Network Speech Recognition.- Network, Distributed and Embedded Speech Recognition: An Overview.- Speech Coding and Packet Loss Effects on Speech and Speaker Recognition.- Speech Recognition Over Mobile Networks.- Speech Recognition Over IP Networks.- Distributed Speech Recognition.- Distributed Speech Recognition Standards.- Speech Feature Extraction and Reconstruction.- Quantization of Speech Features: Source Coding.- Error Recovery: Channel Coding and Packetization.- Error Concealment.- Embedded Speech Recognition.- Algorithm Optimizations: Low Computational Complexity.- Algorithm Optimizations: Low Memory Footprint.- Fixed-Point Arithmetic.- Systems and Applications.- Software Architectures for Networked Mobile Speech Applications.- Speech Recognition in Mobile Phones.- Handheld Speech to Speech Translation System.- Automotive Speech Recognition.- Energy Aware Speech Recognition for Mobile Devices.
IEEE Journal of Selected Topics in Signal Processing | 2010
Zheng-Hua Tan; Børge Lindberg
Frame-based speech processing inherently assumes a stationary behavior of speech signals in a short period of time. Over a long time, the characteristics of the signals can change significantly and frames are not equally important, underscoring the need for frame selection. In this paper, we present a low-complexity and effective frame selection approach based on a posteriori signal-to-noise ratio (SNR) weighted energy distance: The use of an energy distance, instead of, e.g., a standard cepstral distance, makes the approach computationally efficient and enables fine granularity search, and the use of a posteriori SNR weighting emphasizes the reliable regions in noisy speech signals. It is experimentally found that the approach is able to assign a higher frame rate to fast changing events such as consonants, a lower frame rate to steady regions like vowels and no frames to silence, even for very low SNR signals. The resulting variable frame rate analysis method is applied to three speech processing tasks that are essential to natural interaction with intelligent environments. First, it is used for improving speech recognition performance in noisy environments. Second, the method is used for scalable source coding schemes in distributed speech recognition where the target bit rate is met by adjusting the frame rate. Third, it is applied to voice activity detection. Very encouraging results are obtained for all three speech processing tasks.
Speech Communication | 2005
Zheng-Hua Tan; Paul Dalsgaard; Børge Lindberg
Abstract The past decade has witnessed a growing interest in deploying automatic speech recognition (ASR) in communication networks. The networks such as wireless networks present a number of challenges due to e.g. bandwidth constraints and transmission errors. The introduction of distributed speech recognition (DSR) largely eliminates the bandwidth limitations and the presence of transmission errors becomes the key robustness issue. This paper reviews the techniques that have been developed for ASR robustness against transmission errors. In the paper, a model of network degradations and robustness techniques is presented. These techniques are classified into three categories: error detection, error recovery and error concealment (EC). A one-frame error detection scheme is described and compared with a frame-pair scheme. As opposed to vector level techniques a technique for error detection and EC at the sub-vector level is presented. A number of error recovery techniques such as forward error correction and interleaving are discussed in addition to a review of both feature-reconstruction and ASR-decoder based EC techniques. To enable the comparison of some of these techniques, evaluation has been conduced on the basis of the same speech database and channel. Special attention is given to the unique characteristics of DSR as compared to streaming audio e.g. voice-over-IP. Additionally, a technique for adapting ASR to the varying quality of networks is presented. The frame-error-rate is here used to adjust the discrimination threshold with the goal of optimising out-of-vocabulary detection. This paper concludes with a discussion of applicability of different techniques based on the channel characteristics and the system requirements.
IEEE Transactions on Audio, Speech, and Language Processing | 2007
Zheng-Hua Tan; Paul Dalsgaard; Børge Lindberg
In this paper, the temporal correlation of speech is exploited in front-end feature extraction, client-based error recovery, and server-based error concealment (EC) for distributed speech recognition. First, the paper investigates a half frame rate (HFR) front-end that uses double frame shifting at the client side. At the server side, each HFR feature vector is duplicated to construct a full frame rate (FFR) feature sequence. This HFR front-end gives comparable performance to the FFR front-end but contains only half the FFR features. Second, different arrangements of the other half of the FFR features creates a set of error recovery techniques encompassing multiple description coding and interleaving schemes where interleaving has the advantage of not introducing a delay when there are no transmission errors. Third, a subvector-based EC technique is presented where error detection and concealment is conducted at the subvector level as opposed to conventional techniques where an entire vector is replaced even though only a single bit error occurs. The subvector EC is further combined with weighted Viterbi decoding. Encouraging recognition results are observed for the proposed techniques. Lastly, to understand the effects of applying various EC techniques, this paper introduces three approaches consisting of speech feature, dynamic programming distance, and hidden Markov model state duration comparison
international conference on acoustics, speech, and signal processing | 2000
Heidi Christensen; Børge Lindberg; Ove Kjeld Andersen
A multi-stream speech recogniser is based on the combination of multiple feature streams each containing complementary information. In the past, multi-stream research has typically focused on systems that use a single feature extraction method. This heritage from conventional speech recognisers is an unnecessary restriction and both psychoacoustic and phonetic knowledge strongly motivate the use of heterogeneous features. In this paper we investigate how heterogeneous processing can be used in two different multi-stream configurations: first, a system where each stream handles a different frequency region of the speech (a multi-band recogniser) and, second a multi-stream recogniser where each stream handles the full frequency region. For each type of system we compare the performance using both homogeneous and heterogeneous processing. We demonstrate that the use of heterogeneous information significantly improves the clean speech recognition performance motivating us to continue exploring more specifically designed stream processing.
IEEE Transactions on Audio, Speech, and Language Processing | 2007
Haitian Xu; Paul Dalsgaard; Zheng-Hua Tan; Børge Lindberg
Condition-dependent training strategy divides a training database into a number of clusters, each corresponding to a noise condition and subsequently trains a hidden Markov model (HMM) set for each cluster. This paper investigates and compares a number of condition-dependent training strategies in order to achieve a better understanding of the effects on automatic speech recogntion (ASR) performance as caused by a splitting of the training databases. Also, the relationship between mismatches in signal-to-noise ratio (SNR) is analyzed. The results show that a splitting of the training material in terms of both noise type and SNR value is advantageous compared to previously used methods, and that training of only a limited number of HMM sets is sufficient for each noise type for robustly handling of SNR mismatches. This leads to the introduction of an SNR and noise classification-based training strategy (SNT-SNC). Better ASR performance is obtained on test material containing data from known noise types as compared to either multicondition training or noise-type dependent training strategies. The computational complexity of the SNT-SNC framework is kept low by choosing only one HMM set for recognition. The HMM set is chosen on the basis of results from noise classification and SNR value estimations. However, compared to other strategies, the SNT-SNC framework shows lower performance for unknown noise types. This problem is partly overcome by introducing a number of model and feature domain techniques. Experiments using both artificially corrupted and real-world noisy speech databases are conducted and demonstrate the effectiveness of these methods.
international conference on acoustics, speech, and signal processing | 2004
Zheng-Hua Tan; P. Daisgaard; Børge Lindberg
Conventional error concealment (EC) algorithms for distributed speech recognition (DSR) share a common characteristic namely the fact of conducting EC at the vector (or frame) level. This strategy, however, fails to effectively exploit the error-free fraction left within erroneous vectors where a substantial number of subvectors often are error-free. This paper proposes a novel EC approach for DSR encoded by split vector quantization (SVQ) where the detected erroneous vectors are submitted to a further analysis at the subvector level. Specifically, a data consistency test is applied to each erroneous vector to identify inconsistent subvectors. Only inconsistent subvectors are replaced by their nearest neighbouring consistent subvectors whereas consistent subvectors are kept untouched. Experimental results demonstrate that the proposed algorithm in terms of recognition accuracy is superior to conventional EC methods having almost the same complexity and resource requirement.
Biennial on DSP for in-Vehicle and Mobile Systems | 2007
Haitian Xu; Zheng-Hua Tan; Paul Dalsgaard; Ralf Mattethat; Børge Lindberg
The growth in wireless communication and mobile devices has supported the development of distributed speech recognition (DSR) technology. During the last decade this has led to the establishment of ETSI-DSR standards and an increased interest in research aimed at systems exploiting DSR. So far, however, DSR-based systems executing on mobile devices are only in their infancy. One of the reasons is the lack of easy-to-use software development packages. This chapter presents a prototype version of a configurable DSR system for the development of speech enabled applications on mobile devices.
IEEE Signal Processing Letters | 2008
Haitian Xu; Zheng-Hua Tan; Paul Dalsgaard; Børge Lindberg
The nonlocal means (NL-means) algorithm recently proposed for image denoising has proved highly effective for removing additive noise while to a large extent maintaining image details. The algorithm performs denoising by averaging each pixel with other pixels that have similar characteristics in the image. This letter considers the real and imaginary parts of complex speech spectrogram each as a separate image and presents a modified NL-means algorithm to them for denoising to improve the noise robustness of speech recognition. Recognition results on a noisy speech database show that the proposed method is superior to classical methods such as spectral subtraction.
international conference on pattern recognition | 2010
Zheng-Hua Tan; Børge Lindberg
The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within command and control, text entry and search are presented with an emphasis on mobile text entry.