Is this you? Create Your Porfile

Jonathan William Dennis

Agency for Science, Technology and Research

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jonathan William Dennis is active.

Explore More

Publication

Featured researches published by Jonathan William Dennis.

IEEE Transactions on Audio, Speech, and Language Processing | 2013

Image Feature Representation of the Subband Power Distribution for Robust Sound Event Classification

Jonathan William Dennis; Huy Dat Tran; Eng Siong Chng

The ability to automatically recognize a wide range of sound events in real-world conditions is an important part of applications such as acoustic surveillance and machine hearing. Our approach takes inspiration from both audio and image processing fields, and is based on transforming the sound into a two-dimensional representation, then extracting an image feature for classification. This provided the motivation for our previous work on the spectrogram image feature (SIF). In this paper, we propose a novel method to improve the sound event classification performance in severe mismatched noise conditions. This is based on the subband power distribution (SPD) image - a novel two-dimensional representation that characterizes the spectral power distribution over time in each frequency subband. Here, the high-powered reliable elements of the spectrogram are transformed to a localized region of the SPD, hence can be easily separated from the noise. We then extract an image feature from the SPD, using the same approach as for the SIF, and develop a novel missing feature classification approach based on a nearest neighbor classifier (kNN). We carry out comprehensive experiments on a database of 50 environmental sound classes over a range of challenging noise conditions. The results demonstrate that the SPD-IF is both discriminative over the broad range of sound classes, and robust in severe non-stationary noise.

international conference on acoustics, speech, and signal processing | 2013

Temporal coding of local spectrogram features for robust sound recognition

Jonathan William Dennis; Qiang Yu; Huajin Tang; Huy Dat Tran; Haizhou Li

There is much evidence to suggest that the human auditory system uses localised time-frequency information for the robust recognition of sounds. Despite this, conventional systems typically rely on features extracted from short windowed frames over time, covering the whole frequency spectrum. Such approaches are not inherently robust to noise, as each frame will contain a mixture of the spectral information from noise and signal. Here, we propose a novel approach based on the temporal coding of Local Spectrogram Features (LSFs), which generate spikes that are used to train a Spiking Neural Network (SNN) with temporal learning. LSFs represent robust location information in the spectrogram surrounding keypoints, which are detected in a signal-driven manner such that the effect of noise on the temporal coding is reduced. Our experiments demonstrate the robust performance of our approach across a variety of noise conditions, such that it is able to outperform the conventional frame-based baseline methods.

ieee automatic speech recognition and understanding workshop | 2015

Single and multi-channel approaches for distant speech recognition under noisy reverberant conditions: I2R'S system description for the ASpIRE challenge

Jonathan William Dennis; Tran Huy Dat

In this paper, we introduce the system developed at the Institute for Infocomm Research (I2 R) for the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge. The main components of the system are a front-end processing system consisting of a distributed beam-forming algorithm, that performs adaptive weighting and channel elimination, a speech dereverberation approach using a maximum-kurtosis criteria, and a robust voice activity detection (VAD) module based on using the sub-harmonic ratio (SHR). The acoustic back-end consists of a multi-conditional Deep Neural Network (DNN) model that uses speaker adapted features combined with a decoding strategy that performs semi-supervised DNN model adaptation using weighted labels generated by the first-pass decoding output. On the single-microphone evaluation, our system achieved a word error rate (WER) of 44.8%. With the incorporation of beamforming on the multi-microphone evaluation, our system achieved an improvement in WER of over 6% to give the best evaluation result of 38.5%.

international conference on acoustics, speech, and signal processing | 2015

Combining robust spike coding with spiking neural networks for sound event classification

Jonathan William Dennis; Huy Dat Tran; Haizhou Li

This paper proposes a novel biologically inspired method for sound event classification which combines spike coding with a spiking neural network (SNN). Our spike coding extracts keypoints that represent the local maxima components of the sound spectrogram, and are encoded based on their local time-frequency information; hence both location and spectral information are being extracted. We then design a modified tempotron SNN that, unlike the original tempotron, allows the network to learn the temporal distributions of spike coding input, in an analogous way to the generalized Hough transform. The proposed method simultaneously enhances the sparsity of the sound event spectrogram, producing a representation which is robust against noise, as well as maximises the discriminability of the spike coding input in terms of its temporal information, which is important for sound event classification. Experimental results on a large dataset of 50 environment sound events show the superiority of both the spike coding versus the raw spectrogram and the SNN versus conventional cross-entropy neural networks.

international conference on acoustics, speech, and signal processing | 2014

Generalized Gaussian Distribution Kullback-Leibler kernel for robust sound event recognition

Tran Huy Dat; Ng Wen Zheng Terence; Jonathan William Dennis; Leng Yi Ren

In previous works, we have developed a spectrogram image feature extraction framework for robust sound event recognition. The basic idea here is to extract useful information from the 2D time-frequency representation of the sound signal to build up specific feature extractions and classifier under noisy conditions. In this paper, we propose a novel robust spectrogram image method where the key is the observed sparsity of the sound spectrogram image in wavelet representations, which is modeled by the Generalized Gaussian Distributions modeling. Furthermore, the Generalized Gaussian Distribution Kullback-Leibler (GGD-KL) kernel SVM is developed to embed the given probabilistic distance into the quadratic programming machine to optimize the classification The experimental result shows the superiority of the proposed method to the previous works and the state-of-the-art in the field.

IEEE Transactions on Audio, Speech, and Language Processing | 2015

Generalized Hough transform for speech pattern classification

Jonathan William Dennis; Huy Dat Tran; Haizhou Li

While typical hybrid neural network architectures for automatic speech recognition (ASR) use a context window of frame-based features, this may not be the best approach to capture the wider temporal context, which contains phonetic and linguistic information that is equally important. In this paper, we introduce a system that integrates both the spectral and geometrical shape information from the acoustic spectrum, inspired by research in the field of machine vision. In particular, we focus on the Generalized Hough Transform (GHT), which is a sophisticated technique that can model the geometrical distribution of speech information over the wider temporal context. To integrate the GHT as part of a hybrid-ASR system, we propose to use a neural network, with features derived from the probabilistic Hough voting step of the GHT, to implement an improved version of the GHT where the output of the network represents the conventional target class posteriors. A major advantage of our approach is that each step of the GHT is highly interpretable, particularly compared to deep neural network (DNN) systems which are commonly treated as powerful black-box classifiers that give little insight into how the output is achieved. Experiments are carried out on two speech pattern classification tasks. The first is the TIMIT phoneme classification, which demonstrates the performance of the approach on a standard ASR task. The second is a spoken word recognition challenge, which highlights the flexibility of the approach to capture phonetic information within a longer temporal context.

international conference on acoustics, speech, and signal processing | 2014

A discriminatively trained Hough Transform for frame-level phoneme recognition

Jonathan William Dennis; Huy Dat Tran; Haizhou Li; Eng Siong Chng

Despite recent advances in the use of Artificial Neural Network (ANN) architectures for automatic speech recognition (ASR), relatively little attention has been given to using feature inputs beyond MFCCs in such systems. In this paper, we propose an alternative to conventional MFCC or filterbank features, using an approach based on the Generalised Hough Transform (GHT). The GHT is a common approach used in the field of image processing for the task of object detection, where the idea is to learn the spatial distribution of a codebook of feature information relative to the location of the target class. During recognition, a simple weighted summation of the codebook activations is commonly used to detect the presence of the target classes. Here we propose to learn the weighting discriminatively in an ANN, where the aim is to optimise the static phone classification error at the output of the network. As such an ANN is common to hybrid ASR architectures, the output activations from the GHT can be considered as a novel feature for ASR. Experimental results on the TIMIT phoneme recognition task demonstrate the state-of-the-art performance of the approach.

asia pacific signal and information processing association annual summit and conference | 2014

Enhanced local feature approach for overlapping sound event recognition

Jonathan William Dennis; Huy Dat Tran

In this paper, we propose a feature-based approach to address the challenging task of recognising overlapping sound events from single channel audio. Our approach is based on our previous work on Local Spectrogram Features (LSFs), where we combined a local spectral representation of the spectrogram with the Generalised Hough Transform (GHT) voting system for recognition. Here we propose to take the output from the GHT and use it as a feature for classification, and demonstrate that such an approach can improve upon the previous knowledge-based scoring system. Experiments are carried out on a challenging set of five overlapping sound events, with the addition of non-stationary background noise and volume change. The results show that the proposed system can achieve a detection rate of 99% and 91% in clean and 0dB noise conditions respectively, which is a strong improvement over our previous work.

international conference on signal and information processing | 2013

Robust sound event recognition under TV playing conditions

Ng Wen Zheng Terence; Tran Huy Dat; Jonathan William Dennis; Chng Eng Siong

The ability to automatically recognize sound events in real-life conditions is an important part of applications such as acoustic surveillance and smart home automation. The main challenge of these applications is that the sound sources often come from unknown distances under different acoustic environments, which are also noisy and reverberant. Among the noises in the home, the most difficult to deal with are non-stationary interference, such as TV, radio or music playing. In this paper, we address one of the hardest situations of sound event recognition: the presence of interference under reverberant conditions. Our system is a dual microphone approach and consists of a comprehensive combination of several modules: first, a novel regression-based noise cancellation (RNC), to reduce the interference, and second, an improved subband power distribution image feature (iSPD-IF) to classify the noise cancelled signals. A comprehensive experiment is carried out, which demonstrates nearly perfect classification accuracy under severe noisy and reverberant conditions.

asia-pacific signal and information processing association annual summit and conference | 2013

A robust sound event recognition framework under TV playing conditions

Ng Wen Zheng Terence; Tran Huy Dat; Jonathan William Dennis; Chng Eng Siong

In this paper, we address the problem of performing sound event recognition tasks in the presence of television playing in a home environment. Our proposed framework consist of two modules: (1) a novel regression-based noise cancellation (RNC), a preprocessing which utilises a addition reference microphone placed near the television to reduce the noise. RNC learns an empirical mapping instead of the convention adaptive methods to achieve better noise reduction. (2) An improved subband power distribution image feature (iSPD-IF) which build on our existing classification framework by enhancing the feature extraction. A comprehensive experiment is carried out on our recorded data, which demonstrates high classification accuracy under severe television noise.

Explore More