Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lee Ngee Tan is active.

Publication


Featured researches published by Lee Ngee Tan.


international conference on acoustics, speech, and signal processing | 2010

Voice activity detection using harmonic frequency components in likelihood ratio test

Lee Ngee Tan; Bengt J. Borgström; Abeer Alwan

This paper proposes a new statistical model-based likelihood ratio test (LRT) VAD to obtain reliable speech / non-speech decisions. In the proposed method, the likelihood ratio (LR) is calculated differently for voiced frames, as opposed to unvoiced frames: only DFT bins containing harmonic spectral peaks are selected for LR computation. To evaluate the new VADs effectiveness in improving the noise-robustness of ASR, its decisions are applied to pre-processing techniques such as non-linear spectral subtraction, minimum mean square error short-time spectral amplitude estimator, and frame dropping. From the ASR experiments conducted on the Aurora2 database, the proposed harmonic frequency-based LRTs give better results than conventional LRT-based VADs and the standard G.729B and ETSI AMR VADs.


Speech Communication | 2013

Multi-band summary correlogram-based pitch detection for noisy speech

Lee Ngee Tan; Abeer Alwan

A multi-band summary correlogram (MBSC)-based pitch detection algorithm (PDA) is proposed. The PDA performs pitch estimation and voiced/unvoiced (V/UV) detection via novel signal processing schemes that are designed to enhance the MBSCs peaks at the most likely pitch period. These peak-enhancement schemes include comb-filter channel-weighting to yield each individual subbands summary correlogram (SC) stream, and stream-reliability-weighting to combine these SCs into a single MBSC. V/UV detection is performed by applying a constant threshold on the maximum peak of the enhanced MBSC. Narrowband noisy speech sampled at 8kHz are generated from Keele (development set) and CSTR - Centre for Speech Technology Research-(evaluation set) corpora. Both 4-kHz fullband speech, and G.712-filtered telephone speech are simulated. When evaluated solely on pitch estimation accuracy, assuming voicing detection is perfect, the proposed algorithm has the lowest gross pitch error for noisy speech in the evaluation set among the algorithms evaluated (RAPT, YIN, etc.). The proposed PDA also achieves the lowest average pitch detection error, when both pitch estimation and voicing detection errors are taken into account.


international conference on acoustics, speech, and signal processing | 2013

A robust automatic bird phrase classifier using dynamic time-warping with prominent region identification

Kantapon Kaewtip; Lee Ngee Tan; Abeer Alwan; Charles E. Taylor

In this paper, we present a novel approach to birdsong phase classification using template-based techniques suitable even for limited training data and noisy environments. The algorithm utilizes dynamic time-warping and prominent (high-energy) time-frequency regions of training spectrograms to derive templates. The algorithm is evaluated on 32 classes of Cassins Vireo bird phrases. Using only three training examples per class, our algorithm yields a phrase accuracy of 96.23%, outperforming other classifiers (e.g. 85.21% classification accuracy of SVM). In the presence of additive noise (10 dB SNR degradation), the proposed classifier does not degrade significantly, compared to others.


international conference on acoustics, speech, and signal processing | 2011

Noise-robust F0 estimation using SNR-weighted summary correlograms from multi-band comb filters

Lee Ngee Tan; Abeer Alwan

A noise-robust, signal-to-noise ratio (SNR)-weighted correlogrambased pitch estimation algorithm (PEA) in which a bank of comb filters operates in each of the low, mid, and high frequency bands is proposed. Correlograms are obtained by applying autocorrelations directly on the low-freq filterbank (FBK) output, and the output envelopes of all 3 FBKs. An SNR-weighting scheme is used for channel selection to yield a summary correlogram for each FBK. These summary correlograms are averaged to obtain an overall summary correlogram, which is time-smoothed before peak extraction is performed. The final pitch contour is obtained via dynamic programming. The proposed PEA is evaluated on the Keele corpus with additive white or babble noises. In comparison with widely-used PEAs, the proposed PEA has the lowest overall gross pitch error (GPE), especially in low SNR cases.


international conference on acoustics, speech, and signal processing | 2013

Bird phrase segmentation by entropy-driven change point detection

Ni-Chun Wang; Ralph E. Hudson; Lee Ngee Tan; Charles E. Taylor; Abeer Alwan; Kung Yao

A bird phrase segmentation method using entropy-based change point detection is proposed. Spectrograms of bird calls are usually sparse while the background noise is relatively white. Therefore, considering the entropy of a sliding time-frequency block on the spectrogram, the entropy dips when detecting a signal and rises when the signal ends. Rather than applying a hard threshold on the entropy to determine the beginning and ending of a signal, a Bayesian change point detection is used to detect the statistical changes in the entropy sequence. Tests on a database of Cassins Vireo (Vireo cassinii), our proposed segmentation method with spectral subtraction or a novel spectral whitening method as the front-end generates more accurate time labels, lower the false alarm rate than the conventional time-domain energy detection method and achieves high phrase classification rate.


international conference on acoustics, speech, and signal processing | 2013

A sparse representation-based classifier for in-set bird phrase verification and classification with limited training data

Lee Ngee Tan; George Kossan; Martin L. Cody; Charles E. Taylor; Abeer Alwan

The performance of a sparse representation-based (SR) classifier for in-set bird phrase verification and classification is studied. The database contains phrases segmented from songs of the Cassins Vireo (Vireo cassinii). Each test phrase belongs to one of 33 phrase classes - 32 in-set categories, and 1 collective out-of-set category. Only in-set phrases are used for training. From each phrase segment, spectrographic features were extracted, followed by dimension reduction using PCA. A threshold is applied on the sparsity concentration index (SCI) computed by the SR classifier, for in-set bird phrase verification using a limited number of training tokens (3 - 7) per phrase class. When evaluated against the nearest subspace (NS) and support vector machine (SVM) classifiers using the same framework, the SR classifier has the highest classification accuracy, due to its good performances in both the verification and classification tasks.


international conference on acoustics, speech, and signal processing | 2015

Bird-phrase segmentation and verification: A noise-robust template-based approach

Kantapon Kaewtip; Lee Ngee Tan; Charles E. Taylor; Abeer Alwan

In this paper, we present a birdsong-phrase segmentation and verification algorithm that is robust to limited training data, class variability, and noise. The algorithm comprises a noise-robust, Dynamic-Time-Warping (DTW)-based segmentation and a discriminative classifier for outlier rejection. The algorithm utilizes DTW and prominent (high energy) time-frequency regions of training spectrograms to derive a reliable noise-robust template for each phrase class. The resulting template is then used for segmenting continuous recordings to obtain segment candidates whose spectrogram amplitudes in the prominent regions are used as features to a Support Vector Machine (SVM). The algorithm is evaluated on the Cassins Vireo recordings; our proposed system yields low Equal Error Rates (EER) and segment boundaries that are close to those obtained from manual annotations and, is better than energy or entropy-based birdsong segmentation algorithms. In the presence of additive noise (-10 to 10 dB SNR), the proposed phrase detection system does not degrade as significantly as the other algorithms do.


international conference on acoustics, speech, and signal processing | 2014

Feature enhancement using sparse reference and estimated soft-mask exemplar-pairs for noisy speech recognition

Lee Ngee Tan; Abeer Alwan

A feature enhancement technique for noise-robust speech recognition is proposed. Existing sparse exemplar-based feature enhancement methods use clean speech and pure noise Mel-spectral exemplars, or clean and noisy speech log-Mel-spectral exemplar-pairs, in their dictionaries. In contrast, the proposed technique constructs its dictionaries using reference soft-mask (SMref) and estimated soft-mask (SMest) exemplar-pairs derived from the training data. The sparse linear combination of SMest dictionary exemplars that best represents the test utterances SMest is obtained by solving an L1-minimization problem. This sparse linear combination is applied to the SMref exemplar dictionary to generate an enhanced soft-mask for denoising the utterances Mel-spectra before MFCC extraction. On the Aurora-2 noisy speech recognition task, the proposed algorithm outperforms other sparse Mel-spectral exemplar-based feature enhancement schemes when mismatch exists between the dictionary exemplars and the test set. A preliminary experiment on Aurora-4 shows similar trends.


international conference on signal and information processing | 2013

Change point detection methodology used for segmenting bird songs

Ni-Chun Wang; Ralph E. Hudson; Lee Ngee Tan; Charles E. Taylor; Abeer Alwan; Rung Yao

A bird phrase segmentation method using entropy-based change point detection is proposed. Spectrograms of bird calls are usually sparse while the background noise is relatively white. Therefore, considering the entropy of a sliding time-frequency block on the spectrogram, the entropy dips when detecting a signal and rises when the signal ends. Rather than applying a hard threshold on the entropy to determine the beginning and ending of a signal, a Bayesian change point detection is used to detect the statistical changes in the entropy sequence. Tests on a database of Cassins Vireo (Vireo cassinii), our proposed segmentation method with spectral subtraction or a novel spectral whitening method as the front-end generates more accurate time labels, lower the false alarm rate than the conventional time-domain energy detection method and achieves high phrase classification rate.


Journal of the Acoustical Society of America | 2012

Automated entropy-based bird phrase segmentation on sparse representation classifier

Ni-Chun Wang; Lee Ngee Tan; Ralph E. Hudson; George Kossan; Abeer Alwan; Kung Yao; Charles E. Taylor

An automated system capable of reliably segmenting and classifying bird phrases would help analyze field recordings. Here we describe a phrase segmentation method using entropy-based change-point detection. Spectrograms of bird calls are often very sparse while the background noise is relatively white. Therefore, considering the entropy of a sliding time- frequency window on the spectrogram, the entropy dips when detecting a signal and rises back up when the signal ends. Rather than a simple threshold on the entropy to determine the beginning and end of a signal, a Bayesian recursion-based change-point detection(CPD) method is used to detect sudden changes in the entropy sequence. CPD reacts only to those statistical changes, so generates more accurate time labels and reduces the false alarm rate than conventional energy detection methods. The segmented phrases are then used for training and testing a sparse representation(SR) classifier, which performs phrase classification by a sparse linear combination...

Collaboration


Dive into the Lee Ngee Tan's collaboration.

Top Co-Authors

Avatar

Abeer Alwan

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ni-Chun Wang

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

George Kossan

University of California

View shared research outputs
Top Co-Authors

Avatar

Kung Yao

University of California

View shared research outputs
Top Co-Authors

Avatar

Martin L. Cody

University of California

View shared research outputs
Top Co-Authors

Avatar

Adam Janin

University of California

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge