Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Belinda Marie Schwerin is active.

Publication


Featured researches published by Belinda Marie Schwerin.


Speech Communication | 2010

Single-channel speech enhancement using spectral subtraction in the short-time modulation domain

Kuldip Kumar Paliwal; Kamil Wojcicki; Belinda Marie Schwerin

In this paper we investigate the modulation domain as an alternative to the acoustic domain for speech enhancement. More specifically, we wish to determine how competitive the modulation domain is for spectral subtraction as compared to the acoustic domain. For this purpose, we extend the traditional analysis-modification-synthesis framework to include modulation domain processing. We then compensate the noisy modulation spectrum for additive noise distortion by applying the spectral subtraction algorithm in the modulation domain. Using an objective speech quality measure as well as formal subjective listening tests, we show that the proposed method results in improved speech quality. Furthermore, the proposed method achieves better noise suppression than the MMSE method. In this study, the effect of modulation frame duration on speech quality of the proposed enhancement method is also investigated. The results indicate that modulation frame durations of 180-280ms, provide a good compromise between different types of spectral distortions, namely musical noise and temporal slurring. Thus given a proper selection of modulation frame duration, the proposed modulation spectral subtraction does not suffer from musical noise artifacts typically associated with acoustic spectral subtraction. In order to achieve further improvements in speech quality, we also propose and investigate fusion of modulation spectral subtraction with the MMSE method. The fusion is performed in the short-time spectral domain by combining the magnitude spectra of the above speech enhancement algorithms. Subjective and objective evaluation of the speech enhancement fusion shows consistent speech quality improvements across input SNRs.


Speech Communication | 2012

Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator

Kuldip Kumar Paliwal; Belinda Marie Schwerin; Kamil Wojcicki

In this paper we investigate the enhancement of speech by applying MMSE short-time spectral magnitude estimation in the modulation domain. For this purpose, the traditional analysis-modification-synthesis framework is extended to include modulation domain processing. We compensate the noisy modulation spectrum for additive noise distortion by applying the MMSE short-time spectral magnitude estimation algorithm in the modulation domain. A number of subjective experiments were conducted. Initially, we determine the parameter values that maximise the subjective quality of stimuli enhanced using the MMSE modulation magnitude estimator. Next, we compare the quality of stimuli processed by the MMSE modulation magnitude estimator to those processed using the MMSE acoustic magnitude estimator and the modulation spectral subtraction method, and show that good improvement in speech quality is achieved through use of the proposed approach. Then we evaluate the effect of including speech presence uncertainty and log-domain processing on the quality of enhanced speech, and find that this method works better with speech uncertainty. Finally we compare the quality of speech enhanced using the MMSE modulation magnitude estimator (when used with speech presence uncertainty) with that enhanced using different acoustic domain MMSE magnitude estimator formulations, and those enhanced using different modulation domain based enhancement algorithms. Results of these tests show that the MMSE modulation magnitude estimator improves the quality of processed stimuli, without introducing musical noise or spectral smearing distortion. The proposed method is shown to have better noise suppression than MMSE acoustic magnitude estimation, and improved speech quality compared to other modulation domain based enhancement methods considered.


Speech Communication | 2011

Role of modulation magnitude and phase spectrum towards speech intelligibility

Kuldip Kumar Paliwal; Belinda Marie Schwerin; Kamil Wojcicki

In this paper our aim is to investigate the properties of the modulation domain and more specifically, to evaluate the relative contributions of the modulation magnitude and phase spectra towards speech intelligibility. For this purpose, we extend the traditional (acoustic domain) analysis-modification-synthesis framework to include modulation domain processing. We use this framework to construct stimuli that retain only selected spectral components, for the purpose of objective and subjective intelligibility tests. We conduct three experiments. In the first, we investigate the relative contributions to intelligibility of the modulation magnitude, modulation phase, and acoustic phase spectra. In the second experiment, the effect of modulation frame duration on intelligibility for processing of the modulation magnitude spectrum is investigated. In the third experiment, the effect of modulation frame duration on intelligibility for processing of the modulation phase spectrum is investigated. Results of these experiments show that both the modulation magnitude and phase spectra are important for speech intelligibility, and that significant improvement is gained by the inclusion of acoustic phase information. They also show that smaller modulation frame durations improve intelligibility when processing the modulation magnitude spectrum, while longer frame durations improve intelligibility when processing the modulation phase spectrum.


Speech Communication | 2014

Using STFT real and imaginary parts of modulation signals for MMSE-based speech enhancement

Belinda Marie Schwerin; Kuldip Kumar Paliwal

In this paper we investigate an alternate, RI-modulation (R=real, I=imaginary) AMS framework for speech enhancement, in which the real and imaginary parts of the modulation signal are processed in secondary AMS procedures. This framework offers theoretical advantages over the previously proposed modulation AMS frameworks in that noise is additive in the modulation signal and noisy acoustic phase is not used to reconstruct speech. Using the MMSE magnitude estimation to modify modulation magnitude spectra, initial experiments presented in this work evaluate if these advantages translate into improvements in processed speech quality. The effect of speech presence uncertainty and log-domain processing on MMSE magnitude estimation in the RI-modulation framework is also investigated. Finally, a comparison of different enhancement approaches applied in the RI-modulation framework is presented. Using subjective and objective experiments as well as spectrogram analysis, we show that RI-modulation MMSE magnitude estimation with speech presence uncertainty produces stimuli which has a higher preference by listeners than the other RI-modulation types. In comparisons to similar approaches in the modulation AMS framework, results showed that the theoretical advantages of the RI-modulation framework did not translate to an improvement in overall quality, with both frameworks yielding very similar sounding stimuli, but a clear improvement (compared to the corresponding modulation AMS based approach) in speech intelligibility was found.


Speech Communication | 2012

Improving objective intelligibility prediction by combining correlation and coherence based methods with a measure based on the negative distortion ratio

Angel M. Gomez; Belinda Marie Schwerin; Kuldip Kumar Paliwal

In this paper we propose a novel objective method for intelligibility prediction of enhanced speech which is based on the negative distortion ratio (NDR) - that is, the amount of power spectra that has been removed in comparison to the original clean speech signal, likely due to a bad noise estimate during the speech enhancement procedure. While negative spectral distortions can have a significant importance in subjective intelligibility assessment of processed speech, most of the objective measures in the literature do not well account for this type of distortion. The proposed method focuses on a very specific type of noise, so it is not intended to be used alone but in combination with other techniques, to jointly achieve a better intelligibility prediction. In order to find an appropriate technique to be combined with, in this paper we also review a number of recently proposed methods based on correlation and coherence measures. These methods have already shown a high correlation with human recognition scores, as they effectively detect the presence of nonlinearities, frequently found in noise-suppressed speech. However, when these techniques are jointly applied with the proposed method, significantly higher correlations (above r=0.9) are shown to be achieved.


international conference on signal processing and communication systems | 2008

Local-DCT features for facial recognition

Belinda Marie Schwerin; Kuldip Kumar Paliwal

This paper presents the results of a project investigating the use of discrete-cosine transform (DCT) to represent facial images of an identity recognition system. The objective was to address the problem of sensitivity to variation in illumination, faced by commercially available systems. The proposed method uses local, block-wise DCT to extract the features of the image. Results show that this appearance based method gives high identification rates, improved tolerance to variation in illumination, and simple implementation.


Speech Communication | 2014

An improved speech transmission index for intelligibility prediction

Belinda Marie Schwerin; Kuldip Kumar Paliwal

The speech transmission index (STI) is a well known measure of intelligibility, most suited to the evaluation of speech intelligibility in rooms, with stimuli subjected to additive noise and reverberance. However, STI and its many variations do not effectively represent the intelligibility of stimuli containing non-linear distortions such as those resulting from processing by enhancement algorithms. In this paper, we revisit the STI approach and propose a variation which processes the modulation envelope in short-time segments, requiring only an assumption of quasi-stationarity (rather than the stationarity assumption of STI) of the modulation signal. Results presented in this work show that the proposed approach improves the measures correlation to subjective intelligibility scores compared to traditional STI for a range of noise types and subjected to different enhancement approaches. The approach is also shown to have higher correlation than other coherence, correlation and distance measures tested, but is unsuited to the evaluation of stimuli heavily distorted with (for example) masking based processing, where an alternative approach such as STOI is recommended.


Speech Communication | 2016

Phase distortion resulting in a just noticeable difference in the perceived quality of speech

Roger Chappel; Belinda Marie Schwerin; Kuldip Kumar Paliwal

Many enhancement methods only suppress noise in magnitude spectrum, and use noisy or degraded phase in signal reconstruction.Degradation of the phase spectrum reduces stimuli quality where SNR is low.Where instantaneous-SNR (I-SNR) is greater than 7źdB, distortion in phase does not audibly degrade speech quality.For I-SNR lower than 7źdB, distortion is audible, and further processing of phase could improve quality.Where magnitude also includes distortion, I-SNR corresponding to a JND in speech quality due to phase distortion is reduced. Common speech enhancement methods based on the short-time Fourier analysis-modification-synthesis (AMS) framework, modify the magnitude spectrum while keeping the phase spectrum unchanged. This is justified by an assumption that the phase spectrum can be seen as unimportant to speech quality, and hence the noisy phase spectrum can be used as a reasonable estimate of the clean phase spectrum in signal reconstruction. In this work we show, by using an ideal magnitude estimator, that corruption in the phase spectrum can still affect the quality of the resulting speech in low SNR environments. Furthermore, we quantify the distortion in the phase spectrum which can be tolerated before it begins to affect speech quality. This is done through a series of experiments, using both subjective and objective tests, and statistical analysis to evaluate the results. The results show that the phase spectrum computed from noisy speech can be used as an estimate of the phase spectrum of the clean signal without noticeably affecting perceived speech quality, only if the segmental SNR of the noisy speech signal is greater than 7źdB.


Archive | 2015

Modulation Processing for Speech Enhancement

Kuldip Kumar Paliwal; Belinda Marie Schwerin

Many of the traditionally speech enhancement methods reduce noise from corrupted speech by processing the magnitude spectrum in a short-time Fourier analysis-modification-synthesis (AMS) based framework. More recently, use of the modulation domain for speech processing has been investigated, however early efforts in this direction did not account for the changing properties of the modulation spectrum across time. Motivated by this and evidence of the significance of the modulation domain, we investigated the processing of the modulation spectrum on a short-time basis for speech enhancement. For this purpose, a modulation domain-based AMS framework was used, in which the trajectories of each acoustic frequency bin were processed frame-wise in a secondary AMS framework. A number of different enhancement algorithms were investigated for the enhancement of speech in the short-time modulation domain. These included spectral subtraction and MMSE magnitude estimation. In each case, the respective algorithm was used to modify the short-time modulation magnitude spectrum within the modulation AMS framework. Here we review the findings of this investigation, comparing the quality of stimuli enhanced using these modulation based approaches to stimuli enhanced using corresponding modification algorithms applied in the acoustic domain. Results presented show modulation domain based approaches to have improved quality compared to their acoustic domain counterparts. Further, MMSE modulation magnitude estimation (MME) is shown to have improved speech quality compared to Modulation spectral subtraction (ModSSub) stimuli. MME stimuli are found to have good removal of noise without the introduction of musical noise, problematic in spectral subtraction based enhancement. Results also show that ModSSub has minimal musical noise compared to acoustic Spectral subtraction, for appropriately selected modulation frame duration. For modulation domain based methods, modulation frame duration is shown to be an important parameter, with quality generally improved by use of shorter frame durations. From the results of experiments conducted, it is concluded that the short-time modulation domain provides an effective alternative to the short-time acoustic domain for speech processing. Further, that in this domain, MME provides effective noise suppression without the introduction of musical noise distortion.


international conference on signal processing and communication systems | 2016

A comparison of estimation methods in the discrete cosine transform modulation domain for speech enhancement

Aidan E. W. George; Christine Pickersgill; Belinda Marie Schwerin; Stephen So

In this paper, we present a new speech enhancement method that processes noise-corrupted speech in the discrete cosine transform (DCT) modulation domain. In contrast to the Fourier transform, the DCT produces a real-valued signal. Therefore, modulation-based processing in the DCT domain may allow both acoustic Fourier magnitude and phase information to be jointly estimated. Based on segmental SNR and the results of blind subjective listening tests on speech corrupted with various coloured noises, the application of the subspace method in the DCT modulation domain processing was found to outperform all other methods evaluated, including the LogMMSE method.

Collaboration


Dive into the Belinda Marie Schwerin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge