Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mark R. P. Thomas is active.

Publication


Featured researches published by Mark R. P. Thomas.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Inference of Room Geometry From Acoustic Impulse Responses

Fabio Antonacci; Jason Filos; Mark R. P. Thomas; Emanuel A. P. Habets; Augusto Sarti; Patrick A. Naylor; Stefano Tubaro

Acoustic scene reconstruction is a process that aims to infer characteristics of the environment from acoustic measurements. We investigate the problem of locating planar reflectors in rooms, such as walls and furniture, from signals obtained using distributed microphones. Specifically, localization of multiple two- dimensional (2-D) reflectors is achieved by estimation of the time of arrival (TOA) of reflected signals by analysis of acoustic impulse responses (AIRs). The estimated TOAs are converted into elliptical constraints about the location of the line reflector, which is then localized by combining multiple constraints. When multiple walls are present in the acoustic scene, an ambiguity problem arises, which we show can be addressed using the Hough transform. Additionally, the Hough transform significantly improves the robustness of the estimation for noisy measurements. The proposed approach is evaluated using simulated rooms under a variety of different controlled conditions where the floor and ceiling are perfectly absorbing. Results using AIRs measured in a real environment are also given. Additionally, results showing the robustness to additive noise in the TOA information are presented, with particular reference to the improvement achieved through the use of the Hough transform.


IEEE Transactions on Audio, Speech, and Language Processing | 2009

The SIGMA Algorithm: A Glottal Activity Detector for Electroglottographic Signals

Mark R. P. Thomas; Patrick A. Naylor

Accurate estimation of glottal closure instants (GCIs) and opening instants (GOIs) is important for speech processing applications that benefit from glottal-synchronous processing. The majority of existing approaches detect GCIs by comparing the differentiated EGG signal to a threshold and are able to provide accurate results during voiced speech. More recent algorithms use a similar approach across multiple dyadic scales using the stationary wavelet transform. All existing approaches are however prone to errors around the transition regions at the end of voiced segments of speech. This paper describes a new method for EGG-based glottal activity detection which exhibits high accuracy over the entirety of voiced segments, including, in particular, the transition regions, thereby giving significant improvement over existing methods. Following a stationary wavelet transform-based preprocessor, detection of excitation due to glottal closure is performed using a group delay function and then true and false detections are discriminated by Gaussian mixture modeling. GOI detection involves additional processing using the estimated GCIs. The main purpose of our algorithm is to provide a ground-truth for GCIs and GOIs. This is essential in order to evaluate algorithms that estimate GCIs and GOIs from the speech signal only, and is also of high value in the analysis of pathological speech where knowledge of GCIs and GOIs is often needed. We compare our algorithm with two previous algorithms against a hand-labeled database. Evaluation has shown an average GCI hit rate of 99.47% and GOI of 99.35%, compared to 96.08 and 92.54 for the best-performing existing algorithm.


Journal of the Acoustical Society of America | 2012

Rigid sphere room impulse response simulation: Algorithm and applications

Daniel P. Jarrett; Emanuel A. P. Habets; Mark R. P. Thomas; Patrick A. Naylor

Simulated room impulse responses have been proven to be both useful and indispensable for comprehensive testing of acoustic signal processing algorithms while controlling parameters such as the reverberation time, room dimensions, and source-array distance. In this work, a method is proposed for simulating the room impulse responses between a sound source and the microphones positioned on a spherical array. The method takes into account specular reflections of the source by employing the well-known image method, and scattering from the rigid sphere by employing spherical harmonic decomposition. Pseudocode for the proposed method is provided, taking into account various optimizations to reduce the computational complexity. The magnitude and phase errors that result from the finite order spherical harmonic decomposition are analyzed and general guidelines for the order selection are provided. Three examples are presented: an analysis of a diffuse reverberant sound field, a study of binaural cues in the presence of reverberation, and an illustration of the algorithms use as a mouth simulator.


international conference on acoustics, speech, and signal processing | 2014

HRTF magnitude synthesis via sparse representation of anthropometric features

Piotr Tadeusz Bilinski; Jens Ahrens; Mark R. P. Thomas; Ivan Tashev; John Platt

We propose a method for the synthesis of the magnitudes of Head-related Transfer Functions (HRTFs) using a sparse representation of anthropometric features. Our approach treats the HRTF synthesis problem as finding a sparse representation of the subjects anthropometric features w.r.t. the anthropometric features in the training set. The fundamental assumption is that the magnitudes of a given HRTF set can be described by the same sparse combination as the anthropometric data. Thus, we learn a sparse vector that represents the subjects anthropometric features as a linear superposition of the anthropometric features of a small subset of subjects from the training data. Then, we apply the same sparse vector directly on the HRTF tensor data. For evaluation purpose we use a new dataset, containing both anthropometric features and HRTFs. We compare the proposed sparse representation based approach with ridge regression and with the data of a manikin (which was designed based on average anthropometric data), and we simulate the best and the worst possible classifiers to select one of the HRTFs from the dataset. For instrumental evaluation we use log-spectral distortion. Experiments show that our sparse representation outperforms all other evaluated techniques, and that the synthesized HRTFs are almost as good as the best possible HRTF classifier.


international conference on acoustics, speech, and signal processing | 2009

Data-driven voice soruce waveform modelling

Mark R. P. Thomas; Jon Gudnason; Patrick A. Naylor

This paper presents a data-driven approach to the modelling of voice source waveforms. The voice source is a signal that is estimated by inverse-filtering speech signals with an estimate of the vocal tract filter. It is used in speech analysis, synthesis, recognition and coding to decompose a speech signal into its source and vocal tract filter components. Existing approaches parameterize the voice source signal with physically- or mathematically-motivated models. Though the models are well-defined, estimation of their parameters is not well understood and few are capable of reproducing the large variety of voice source waveforms. Here we present a data-driven approach to classify types of voice source waveforms based upon their melfrequency cepstrum coefficients with Gaussian mixture modelling. A set of “prototype” waveform classes is derived from a weighted average of voice source cycles from real data. An unknown speech signal is then decomposed into its prototype components and resynthesized. Results indicate that with sixteen voice source classes, low resynthesis errors can be achieved.


workshop on applications of signal processing to audio and acoustics | 2007

A Practical Multichannel Dereverberation Algorithm using Multichannel Dypsa and Spatiotemporal Averaging

Mark R. P. Thomas; Nikolay D. Gaubitch; Jon Gudnason; Patrick A. Naylor

Speech signals for hands-free telecommunication applications are received by one or more microphones placed at some distance from the talker. In an office environment, for example, unwanted signals such as reverberation and background noise from computers and other talkers will degrade the quality of the received signal. These unwanted components have an adverse effect upon speech processing algorithms and impair intelligibility. This paper demonstrates the use of the Multichannel DYPSA algorithm to identify glottal closure instants (GCIs) from noisy, reverberant speech. Using the estimated GCIs, a spatiotemporal averaging technique is applied to attenuate the unwanted components. Experiments with a microphone array demonstrate the dereverberation and noise suppression of the spatiotemporal averaging method, showing up to a 5 dB improvement in segmental SNR and 0.33 in normalized Bark spectral distortion score.


international conference on acoustics, speech, and signal processing | 2011

Simulating room impulse responses for spherical microphone arrays

Daniel P. Jarrett; Emanuel A. P. Habets; Mark R. P. Thomas; Patrick A. Naylor

A method is proposed for simulating the sound pressure signals on a spherical microphone array in a reverberant enclosure. The method employs spherical harmonic decomposition and takes into account scattering from a solid sphere. An analysis shows that the error in the decomposition can be made arbitrarily small given a sufficient number of spherical harmonics.


Speech Communication | 2012

Data-driven voice source waveform analysis and synthesis

Jon Gudnason; Mark R. P. Thomas; Daniel P. W. Ellis; Patrick A. Naylor

A data-driven approach is introduced for studying, analyzing and processing the voice source signal. Existing approaches parameterize the voice source signal by using models that are motivated, for example, by a physical model or function-fitting. Such parameterization is often difficult to achieve and it produces a poor approximation to a large variety of real voice source waveforms of the human voice. This paper presents a novel data-driven approach to analyze different types of voice source waveforms using principal component analysis and Gaussian mixture modeling. This approach models certain voice source features that many other approaches fail to model. Prototype voice source waveforms are obtained from each mixture component and analyzed with respect to speaker, phone and pitch. An analysis/synthesis scheme was set up to demonstrate the effectiveness of the method. Compression of the proposed voice source by discarding 75% of the features yields a segmental signal-to-reconstruction error ratio of 13dB and a Bark spectral distortion of 0.14.


2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays | 2011

Dereverberation performance of rigid and open spherical microphone arrays: Theory & simulation

Daniel P. Jarrett; Emanuel A. P. Habets; Mark R. P. Thomas; Nikolay D. Gaubitch; Patrick A. Naylor

Linear microphone arrays have been extensively used for dereverberation. In this paper we look at the dereverberation performance of two types of spherical microphone array: the open array (microphones suspended in free space) and the rigid array (microphones mounted on a rigid baffle). Dereverberation is performed in the spherical harmonic domain using a technique similar to the commonly used delay-and-sum beamformer (DSB). We analyse the theoretical performance with respect to the direct-to-reverberant ratio (DRR), and we also present simulation results obtained using a simulation tool for spherical arrays. The performance of the spherical DSB is found to increase with the radius of the sphere, and to be 1–2 dB higher for the rigid array. These results serve as a baseline for evaluating the performance of future dereverberation algorithms for spherical arrays.


international conference on acoustics, speech, and signal processing | 2010

Voice source estimation for artificial bandwidth extension of telephone speech

Mark R. P. Thomas; Jon Gudnason; Patrick A. Naylor; Bernd Geiser; Peter Vary

Artificial bandwidth extension (ABWE) of speech signals aims to estimate wideband speech (50 Hz – 7 kHz) from narrowband signals (300 Hz – 3.4 kHz). Applying the source-filter model of speech, many existing algorithms estimate vocal tract filter parameters independently of the source signal. However, many current methods for extending the narrowband voice source signal are limited to straightforward signal processing techniques which are only effective for high-band estimation. This paper presents a method for ABWE that employs novel data-driven modelling and an existing spectral mirroring technique to estimate the wideband source signal in both the high and low extension bands. A state-of-the-art Hidden Markov Model-based estimator evaluates the temporal and spectral envelopes in the missing frequency bands, with which the ABWE speech signal is synthesized. Informal listening tests comparing two existing source estimation techniques and two permutations of the proposed approach show an improvement in the perceived bandwidth of speech signals, in particular towards low frequencies. Subjective tests on the same data show a preference for the proposed techniques over the existing methods under test.

Collaboration


Dive into the Mark R. P. Thomas's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nikolay D. Gaubitch

Delft University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jon Gudnason

Imperial College London

View shared research outputs
Top Co-Authors

Avatar

Emanuel A. P. Habets

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Felicia Lim

Imperial College London

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jason Filos

Imperial College London

View shared research outputs
Researchain Logo
Decentralizing Knowledge