Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Guy J. Brown is active.

Publication


Featured researches published by Guy J. Brown.


IEEE Transactions on Speech and Audio Processing | 2003

A multipitch tracking algorithm for noisy speech

Mingyang Wu; DeLiang Wang; Guy J. Brown

An effective multipitch tracking algorithm for noisy speech is critical for acoustic signal processing. However, the performance of existing algorithms is not satisfactory. We present a robust algorithm for multipitch tracking of noisy speech. Our approach integrates an improved channel and peak selection method, a new method for extracting periodicity information across different channels, and a hidden Markov model (HMM) for forming continuous pitch tracks. The resulting algorithm can reliably track single and double pitch tracks in a noisy environment. We suggest a pitch error measure for the multipitch situation. The proposed algorithm is evaluated on a database of speech utterances mixed with various types of interference. Quantitative comparisons show that our algorithm significantly outperforms existing ones.


IEEE Transactions on Neural Networks | 1999

Separation of speech from interfering sounds based on oscillatory correlation

DeLiang Wang; Guy J. Brown

A multistage neural model is proposed for an auditory scene analysis task--segregating speech from interfering sound sources. The core of the model is a two-layer oscillator network that performs stream segregation on the basis of oscillatory correlation. In the oscillatory correlation framework, a stream is represented by a population of synchronized relaxation oscillators, each of which corresponds to an auditory feature, and different streams are represented by desynchronized oscillator populations. Lateral connections between oscillators encode harmonicity, and proximity in frequency and time. Prior to the oscillator network are a model of the auditory periphery and a stage in which mid-level auditory representations are formed. The model has been systematically evaluated using a corpus of voiced speech mixed with interfering sounds, and produces improvements in terms of signal-to-noise ratio for every mixture. The performance of our model is compared with other studies on computational auditory scene analysis. A number of issues including biological plausibility and real-time implementation are also discussed.


Speech Communication | 2004

A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation

Kalle J. Palomäki; Guy J. Brown; DeLiang Wang

In this study we describe a binaural auditory model for recognition of speech in the presence of spatially separated noise intrusions, under small-room reverberation conditions. The principle underlying the model is to identify time–frequency regions which constitute reliable evidence of the speech signal. This is achieved both by determining the spatial location of the speech source, and by grouping the reliable regions according to common azimuth. Reliable time–frequency regions are passed to a missing data speech recogniser, which performs decoding based on this partial description of the speech signal. In order to obtain robust estimates of spatial location in reverberant conditions, we incorporate some aspects of precedence effect processing into the auditory model. We show that the binaural auditory model improves speech recognition performance in small room reverberation conditions in the presence of spatially separated noise, particularly for conditions in which the spatial separation is 20� or larger. We also demonstrate that the binaural system outperforms a single channel approach, notably in cases where the target speech and noise intrusion have substantial spectral overlap. � 2004 Elsevier B.V. All rights reserved.


Speech Communication | 2004

Techniques for handling convolutional distortion with `missing data' automatic speech recognition

Kalle J. Palomäki; Guy J. Brown; Jon Barker

In this study we describe two techniques for handling convolutional distortion with ‘missing data’ speech recognition using spectral features. The missing data approach to automatic speech recognition (ASR) is motivated by a model of human speech perception, and involves the modification of a hidden Markov model (HMM) classifier to deal with missing or unreliable features. Although the missing data paradigm was proposed as a means of handling additive noise in ASR, we demonstrate that it can also be effective in dealing with convolutional distortion. Firstly, we propose a normalisation technique for handling spectral distortions and changes of input level (possibly in the presence of additive noise). The technique computes a normalising factor only from the most intense regions of the speech spectrum, which are likely to remain intact across various noise conditions. We show that the proposed normalisation method improves performance compared to a conventional missing data approach with spectrally distorted and noise contaminated speech, and in conditions where the gain of the input signal varies. Secondly, we propose a method for handling reverberated speech which attempts to identify time-frequency regions that are not badly contaminated by reverberation and have strong speech energy. This is achieved by using modulation filtering to identify ‘reliable’ regions of the speech spectrum. We demonstrate that our approach improves recognition performance in cases where the reverberation time T60 exceeds 0.7 s, compared to a baseline system which uses acoustic features derived from perceptual linear prediction and the modulation-filtered spectrogram. � 2004 Elsevier B.V. All rights reserved.


Journal of New Music Research | 1994

Perceptual grouping of musical sounds: A computational model

Guy J. Brown; Martin Cooke

Abstract There is substantial recent evidence that the mixture of sounds reaching the ears is subjected to an analysis by early auditory processes, in which acoustic components with common properties are grouped into a single percept. Since most music consists of a mixture of sounds, it might be expected that early auditory processing also mediates the perception of musical sounds. In this article, we present a computational model of some aspects of auditory processing, and investigate the grouping of musical sounds using the model. In particular, a scheme is developed for grouping sounds across time according to their timbres, as represented within a two‐dimensional space of “brightness” and onset asynchrony.


Archive | 2005

Separation of Speech by Computational Auditory Scene Analysis

Guy J. Brown; DeLiang Wang

The term auditory scene analysis (ASA) refers to the ability of human listeners to form perceptual representations of the constituent sources in an acoustic mixture, as in the well-known ‘cocktail party’ effect. Accordingly, computational auditory scene analysis (CASA) is the field of study which attempts to replicate ASA in machines. Some CASA systems are closely modelled on the known stages of auditory processing, whereas others adopt a more functional approach. However, all are broadly based on the principles underlying the perception and organization of sound by human listeners, and in this respect they differ from ICA and other approaches to sound separation. In this chapter, we review the principles underlying ASA and show how they can be implemented in CASA systems. We also consider the link between CASA and automatic speech recognition, and draw distinctions between the CASA and ICA approaches.


IEEE Transactions on Speech and Audio Processing | 2001

A comparison of auditory and blind separation techniques for speech segregation

A.J.W. van der Kouwe; DeLiang Wang; Guy J. Brown

A fundamental problem in auditory and speech processing is the segregation of speech from concurrent sounds. This problem has been a focus of study in computational auditory scene analysis (CASA), and it has also been investigated from the perspective of blind source separation. Using a standard corpus of voiced speech mixed with interfering sounds, we report a comparison between CASA and blind source separation techniques, which have been developed independently. Our comparison reveals that they perform well under very different conditions. A number of conclusions are drawn with respect to their relative strengths and weaknesses in speech segregation applications as well as in modeling auditory function.


international conference on acoustics, speech, and signal processing | 2004

Instrument recognition in accompanied sonatas and concertos

Jana Eggink; Guy J. Brown

A system for musical instrument recognition is introduced. In contrast to most existing systems, it can identify a solo instrument even in the presence of an accompanying keyboard instrument or orchestra. To enable recognition in the presence of a highly polyphonic background, we use features based solely on the partials of the target tone. The approach is based on the assumption that it is possible to extract the most prominent fundamental frequency and the corresponding harmonic overtone series, and that these most often belong to the solo instrument. Classification is carried out using a Gaussian classifier trained on examples of monophonic music. Testing our system on accompanied sonatas and concertos we achieved a recognition rate of 86% for 5 different instruments, an accuracy comparable to that of systems limited to monophonic music only.


international conference on acoustics, speech, and signal processing | 2003

A missing feature approach to instrument identification in polyphonic music

Jana Eggink; Guy J. Brown

Gaussian mixture model (GMM) classifiers have been shown to give good instrument recognition performance for monophonic music played by a single instrument. However, many applications (such as automatic music transcription) require instrument identification from polyphonic, multi-instrumental recordings. We address this problem by incorporating ideas from missing feature theory into a GMM classifier. Specifically, frequency regions that are dominated by energy from an interfering tone are marked as unreliable and excluded from the classification process. This approach has been evaluated on random two-tone chords and an excerpt from a commercially available compact disc, with promising results.


Endeavour | 1993

Computational auditory scene analysis: listening to several things at once

Martin Cooke; Guy J. Brown; Malcolm Crawford; Phil D. Green

The problem of distinguishing particular sounds, such as conversation, against a background of irrelevant noise is a matter of common experience. Psychologists have studied it for some 40 years, but it is only comparatively recently that computer modelling of the phenomenon has been attempted. This article reviews progress made, possible practical applications, and prospects for the future.

Collaboration


Dive into the Guy J. Brown's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Martin Cooke

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jon Barker

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar

Ning Ma

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar

Bill Wells

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar

Emina Kurtic

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar

Simon Tucker

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge