Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Russell Mason is active.

Publication


Featured researches published by Russell Mason.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Dynamic Precedence Effect Modeling for Source Separation in Reverberant Environments

Christopher Hummersone; Russell Mason; Tim Brookes

Reverberation continues to present a major problem for sound source separation algorithms. However, humans demonstrate a remarkable robustness to reverberation and many psychophysical and perceptual mechanisms are well documented. The precedence effect is one of these mechanisms; it aids our ability to localize sounds in reverberation. Despite this, relatively little work has been done on incorporating the precedence effect into automated source separation. Furthermore, no work has been carried out on adapting a precedence model to the acoustic conditions under test and it is unclear whether such adaptation, analogous to the perceptual Clifton effect, is even necessary. Hence, this study tests a previously proposed binaural separation/precedence model in real rooms with a range of reverberant conditions. The precedence model inhibitory time constant and inhibitory gain are varied in each room in order to establish the necessity for adaptation to the acoustic conditions. The paper concludes that adaptation is necessary and can yield significant gains in separation performance. Furthermore, it is shown that the initial time delay gap and the direct-to-reverberant ratio are important factors when considering this adaptation.


Journal of the Acoustical Society of America | 2005

Frequency dependency of the relationship between perceived auditory source width and the interaural cross-correlation coefficient for time-invariant stimuli.

Russell Mason; Tim Brookes; Francis Rumsey

Previous research has indicated that the relationship between the interaural cross-correlation coefficient (IACC) of a narrow-band sound and its perceived auditory source width is dependent on its frequency. However, this dependency has not been investigated in sufficient detail for researchers to be able to properly model it in order to produce a perceptually relevant IACC-based model of auditory source width. A series of experiments has therefore been conducted to investigate this frequency dependency in a controlled manner, and to derive an appropriate model. Three main factors were discovered in the course of these experiments. First, the nature of the frequency dependency of the perceived auditory source width of stimuli with an IACC of 1 was determined, and an appropriate mathematical model was derived. Second, the loss of perceived temporal detail at high frequencies, caused by the breakdown of phase locking in the ear, was found to be relevant, and the model was modified accordingly using rectification and a low-pass filter. Finally, it was found that there was a further frequency dependency at low frequencies, and a method for modeling this was derived. The final model was shown to predict the experimental data well.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Ideal Binary Mask Ratio: A Novel Metric for Assessing Binary-Mask-Based Sound Source Separation Algorithms

Christopher Hummersone; Russell Mason; Tim Brookes

A number of metrics has been proposed in the literature to assess sound source separation algorithms. The addition of convolutional distortion raises further questions about the assessment of source separation algorithms in reverberant conditions as reverberation is shown to undermine the optimality of the ideal binary mask (IBM) in terms of signal-to-noise ratio (SNR). Furthermore, with a range of mixture parameters common across numerous acoustic conditions, SNR-based metrics demonstrate an inconsistency that can only be attributed to the convolutional distortion. This suggests the necessity for an alternate metric in the presence of convolutional distortion, such as reverberation. Consequently, a novel metric-dubbed the IBM ratio (IBMR)-is proposed for assessing source separation algorithms that aim to calculate the IBM. The metric is robust to many of the effects of convolutional distortion on the output of the system and may provide a more representative insight into the performance of a given algorithm .


european signal processing conference | 2016

Evaluation of audio source separation models using hypothesis-driven non-parametric statistical methods

Andrew J. R. Simpson; Gerard Roma; Emad M. Grais; Russell Mason; Christopher Hummersone; Antoine Liutkus; Mark D. Plumbley

Audio source separation models are typically evaluated using objective separation quality measures, but rigorous statistical methods have yet to be applied to the problem of model comparison. As a result, it can be difficult to establish whether or not reliable progress is being made during the development of new models. In this paper, we provide a hypothesis-driven statistical analysis of the results of the recent source separation SiSEC challenge involving twelve competing models tested on separation of voice and accompaniment from fifty pieces of “professionally produced” contemporary music. Using non-parametric statistics, we establish reliable evidence for meaningful conclusions about the performance of the various models.


Journal of the Acoustical Society of America | 2013

Modeling listener distraction resulting from audio-on-audio interference

Jon Francombe; Russell Mason; Martin Dewhirst; So̸ren Bech

As devices that produce audio become more commonplace and increasingly portable, situations in which two competing audio programs are present occur more regularly. In order to support the design of systems intended to mitigate the effects of interfering audio (including sound field control, noise cancelation or source separation systems), it is desirable to model the perceived distraction in such situations. Distraction ratings were collected for a range of audio-on-audio interference situations including various target and interferer programs at three interferer levels, with and without road noise. Time-frequency target-to-interferer ratio (TIR) maps of the stimuli were created using a simple auditory model. A number of feature sets were extracted from the TIR maps, including combinations of mean, standard deviation, minimum and maximum TIR taken across the duration of the program item. In order to predict distraction ratings from the features, linear regression models were produced. The models were evaluated for goodness-of-fit (RMSE) and generalizability (using a K-fold cross-validation procedure). The best model performed well, with almost all predictions falling within the 95% confidence intervals of the perceptual data. A validation data set was used to test the model, suggesting areas for future improvement.


international conference on latent variable analysis and signal separation | 2017

Psychophysical Evaluation of Audio Source Separation Methods

Andrew J. R. Simpson; Gerard Roma; Emad M. Grais; Russell Mason; Christopher Hummersone; Mark D. Plumbley

Source separation evaluation is typically a top-down process, starting with perceptual measures which capture fitness-for-purpose and followed by attempts to find physical (objective) measures that are predictive of the perceptual measures. In this paper, we take a contrasting bottom-up approach. We begin with the physical measures provided by the Blind Source Separation Evaluation Toolkit (BSS Eval) and we then look for corresponding perceptual correlates. This approach is known as psychophysics and has the distinct advantage of leading to interpretable, psychophysical models. We obtained perceptual similarity judgments from listeners in two experiments featuring vocal sources within musical mixtures. In the first experiment, listeners compared the overall quality of vocal signals estimated from musical mixtures using a range of competing source separation methods. In a loudness experiment, listeners compared the loudness balance of the competing musical accompaniment and vocal. Our preliminary results provide provisional validation of the psychophysical approach.


Journal of the Acoustical Society of America | 2013

The computational prediction of masking thresholds for ecologically valid interference scenarios

Khan Baykaner; Christopher Hummersone; Russell Mason; So̸ren Bech

Auditory interference scenarios, where a listener wishes to attend to some target audio while being presented with interfering audio, are prevalent in daily life. The goal of developing an accurate computational model which can predict masking thresholds for such scenarios is still incomplete. While some sophisticated, physiologically inspired, masking prediction models exist, they are rarely tested with ecologically valid programs (such as music and speech). In order to test the accuracy of model predictions human listener data is required. To that end a masking threshold experiment was conducted for a variety of target and interferer programs. The results were analyzed alongside predictions made by the computational auditory signal processing and prediction model described by Jepsen et al. (2008). Masking thresholds were predicted to within 3 dB root mean squared error with the greatest prediction inaccuracies occurring in the presence of speech. These results are comparable to those of the model by Glasberg and Moore (2005) for predicting the audibility of time-varying sounds in the presence of background sounds, which otherwise represent the most accurate predictions of this type in the literature.


quality of multimedia experience | 2016

Determining and labeling the preference dimensions of spatial audio replay

Jon Francombe; Tim Brookes; Russell Mason; James Woodcock

There are currently many spatial audio reproduction systems in domestic use (e.g. mono, stereo, surround sound, sound bars, and headphones). In an experiment, pairwise preference magnitude ratings for a range of such systems were collected from trained and untrained listeners. The ratings were analysed using internal preference mapping to: (i) uncover the principal perceptual dimensions of listener preference; (ii) label the dimensions based on important perceptual attributes; and (iii) observe differences between trained and untrained listeners. To aid with labelling the dimensions, perceptual attributes were elicited alongside the preference ratings and were analysed by: (i) considering a metric derived from the frequency of use of each attribute and the magnitude of the related preference judgements; and (ii) observing attribute use for comparisons between specific methods. The first preference dimension accounted for over 90% of the variance in ratings; all participants exhibited a preference for reproduction methods that were positively correlated with the first dimension (most notably 5-, 9-, and 22-channel surround sound). This dimension was related to multiple important attributes, including those associated with spatial capability and absence of distortions. The second dimension accounted for only a very small proportion of the variance, and appeared to separate the headphone method from the other methods. The trained and untrained listeners generally showed opposite preferences in the second dimension, suggesting that trained listeners have a higher preference for headphone reproduction than untrained listeners.


Journal of the Acoustical Society of America | 2016

Eliciting the most prominent perceived differences between microphones

Andy Pearce; Tim Brookes; Martin Dewhirst; Russell Mason

The attributes contributing to the differences perceived between microphones (when auditioning recordings made with those microphones) are not clear from previous research. Consideration of technical specifications and expert opinions indicated that recording five programme items with eight studio and two microelectromechanical system microphones could allow determination of the attributes related to the most prominent inter-microphone differences. Pairwise listening comparisons between the resulting 50 recordings, followed by multi-dimensional scaling analysis, revealed up to 5 salient dimensions per programme item; 17 corresponding pairs of recordings were selected exemplifying the differences across those dimensions. Direct elicitation and panel discussions on the 17 pairs identified a hierarchy of 40 perceptual attributes. An attribute contribution experiment on the 31 lowest-level attributes in the hierarchy allowed them to be ordered by degree of contribution and showed brightness, harshness, and clarity to always contribute highly to perceived inter-microphone differences. This work enables the future development of objective models to predict these important attributes.


international conference on acoustics, speech, and signal processing | 2013

Selection of temporal windows for the computational prediction of masking thresholds

Khan Baykaner; Christopher Hummersone; Russell Mason; Søren Bech

In the field of auditory masking threshold predictions an optimal method for buffering a continuous, ecologically valid programme combination into discrete temporal windows has yet to be determined. An investigation was carried out into the use of a variety of temporal window durations, shapes, and steps, in order to discern the resultant effect upon the accuracy of various masking threshold prediction models. Selection of inappropriate temporal windows can triple the prediction error in some cases. Overlapping windows were found to produce the lowest errors provided that the predictions were smoothed appropriately. The optimal window shape varied across the tested models. The most accurate variant of each model resulted in root mean squared errors of 2.3, 3.4, and 4.2 dB.

Collaboration


Dive into the Russell Mason's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge