Christopher Hummersone
University of Surrey
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Christopher Hummersone.
IEEE Transactions on Audio, Speech, and Language Processing | 2010
Christopher Hummersone; Russell Mason; Tim Brookes
Reverberation continues to present a major problem for sound source separation algorithms. However, humans demonstrate a remarkable robustness to reverberation and many psychophysical and perceptual mechanisms are well documented. The precedence effect is one of these mechanisms; it aids our ability to localize sounds in reverberation. Despite this, relatively little work has been done on incorporating the precedence effect into automated source separation. Furthermore, no work has been carried out on adapting a precedence model to the acoustic conditions under test and it is unclear whether such adaptation, analogous to the perceptual Clifton effect, is even necessary. Hence, this study tests a previously proposed binaural separation/precedence model in real rooms with a range of reverberant conditions. The precedence model inhibitory time constant and inhibitory gain are varied in each room in order to establish the necessity for adaptation to the acoustic conditions. The paper concludes that adaptation is necessary and can yield significant gains in separation performance. Furthermore, it is shown that the initial time delay gap and the direct-to-reverberant ratio are important factors when considering this adaptation.
Archive | 2014
Christopher Hummersone; Toby Stokes; Tim Brookes
The ideal binary mask (IBM) is widely considered to be the benchmark for time–frequency-based sound source separation techniques such as computational auditory scene analysis (CASA). However, it is well known that binary masking introduces objectionable distortion, especially musical noise. This can make binary masking unsuitable for sound source separation applications where the output is auditioned. It has been suggested that soft masking reduces musical noise and leads to a higher quality output. A previously defined soft mask, the ideal ratio mask (IRM), is found to have similar properties to the IBM, may correspond more closely to auditory processes, and offers additional computational advantages. Consequently, the IRM is proposed as the goal of CASA. To further support this position, a number of studies are reviewed that show soft masks to provide superior performance to the IBM in applications such as automatic speech recognition and speech intelligibility. A brief empirical study provides additional evidence demonstrating the objective and perceptual superiority of the IRM over the IBM.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Christopher Hummersone; Russell Mason; Tim Brookes
A number of metrics has been proposed in the literature to assess sound source separation algorithms. The addition of convolutional distortion raises further questions about the assessment of source separation algorithms in reverberant conditions as reverberation is shown to undermine the optimality of the ideal binary mask (IBM) in terms of signal-to-noise ratio (SNR). Furthermore, with a range of mixture parameters common across numerous acoustic conditions, SNR-based metrics demonstrate an inconsistency that can only be attributed to the convolutional distortion. This suggests the necessity for an alternate metric in the presence of convolutional distortion, such as reverberation. Consequently, a novel metric-dubbed the IBM ratio (IBMR)-is proposed for assessing source separation algorithms that aim to calculate the IBM. The metric is robust to many of the effects of convolutional distortion on the output of the system and may provide a more representative insight into the performance of a given algorithm .
european signal processing conference | 2016
Andrew J. R. Simpson; Gerard Roma; Emad M. Grais; Russell Mason; Christopher Hummersone; Antoine Liutkus; Mark D. Plumbley
Audio source separation models are typically evaluated using objective separation quality measures, but rigorous statistical methods have yet to be applied to the problem of model comparison. As a result, it can be difficult to establish whether or not reliable progress is being made during the development of new models. In this paper, we provide a hypothesis-driven statistical analysis of the results of the recent source separation SiSEC challenge involving twelve competing models tested on separation of voice and accompaniment from fifty pieces of “professionally produced” contemporary music. Using non-parametric statistics, we establish reliable evidence for meaningful conclusions about the performance of the various models.
international conference on latent variable analysis and signal separation | 2017
Andrew J. R. Simpson; Gerard Roma; Emad M. Grais; Russell Mason; Christopher Hummersone; Mark D. Plumbley
Source separation evaluation is typically a top-down process, starting with perceptual measures which capture fitness-for-purpose and followed by attempts to find physical (objective) measures that are predictive of the perceptual measures. In this paper, we take a contrasting bottom-up approach. We begin with the physical measures provided by the Blind Source Separation Evaluation Toolkit (BSS Eval) and we then look for corresponding perceptual correlates. This approach is known as psychophysics and has the distinct advantage of leading to interpretable, psychophysical models. We obtained perceptual similarity judgments from listeners in two experiments featuring vocal sources within musical mixtures. In the first experiment, listeners compared the overall quality of vocal signals estimated from musical mixtures using a range of competing source separation methods. In a loudness experiment, listeners compared the loudness balance of the competing musical accompaniment and vocal. Our preliminary results provide provisional validation of the psychophysical approach.
Journal of the Acoustical Society of America | 2013
Khan Baykaner; Christopher Hummersone; Russell Mason; So̸ren Bech
Auditory interference scenarios, where a listener wishes to attend to some target audio while being presented with interfering audio, are prevalent in daily life. The goal of developing an accurate computational model which can predict masking thresholds for such scenarios is still incomplete. While some sophisticated, physiologically inspired, masking prediction models exist, they are rarely tested with ecologically valid programs (such as music and speech). In order to test the accuracy of model predictions human listener data is required. To that end a masking threshold experiment was conducted for a variety of target and interferer programs. The results were analyzed alongside predictions made by the computational auditory signal processing and prediction model described by Jepsen et al. (2008). Masking thresholds were predicted to within 3 dB root mean squared error with the greatest prediction inaccuracies occurring in the presence of speech. These results are comparable to those of the model by Glasberg and Moore (2005) for predicting the audibility of time-varying sounds in the presence of background sounds, which otherwise represent the most accurate predictions of this type in the literature.
international conference on acoustics, speech, and signal processing | 2013
Khan Baykaner; Christopher Hummersone; Russell Mason; Søren Bech
In the field of auditory masking threshold predictions an optimal method for buffering a continuous, ecologically valid programme combination into discrete temporal windows has yet to be determined. An investigation was carried out into the use of a variety of temporal window durations, shapes, and steps, in order to discern the resultant effect upon the accuracy of various masking threshold prediction models. Selection of inappropriate temporal windows can triple the prediction error in some cases. Overlapping windows were found to produce the lowest errors provided that the predictions were smoothed appropriately. The optimal window shape varied across the tested models. The most accurate variant of each model resulted in root mean squared errors of 2.3, 3.4, and 4.2 dB.
Archive | 2011
Christopher Hummersone
Journal of The Audio Engineering Society | 2013
Christopher Hummersone; Russell Mason; Tim Brookes
Journal of The Audio Engineering Society | 2007
Slawomir Zielinski; Philip Hardisty; Christopher Hummersone; Francis Rumsey