Stephan D. Ewert
University of Oldenburg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stephan D. Ewert.
Journal of the Acoustical Society of America | 2008
Morten Løve Jepsen; Stephan D. Ewert; Torsten Dau
A model of computational auditory signal-processing and perception that accounts for various aspects of simultaneous and nonsimultaneous masking in human listeners is presented. The model is based on the modulation filterbank model described by Dau et al. [J. Acoust. Soc. Am. 102, 2892 (1997)] but includes major changes at the peripheral and more central stages of processing. The model contains outer- and middle-ear transformations, a nonlinear basilar-membrane processing stage, a hair-cell transduction stage, a squaring expansion, an adaptation stage, a 150-Hz lowpass modulation filter, a bandpass modulation filterbank, a constant-variance internal noise, and an optimal detector stage. The model was evaluated in experimental conditions that reflect, to a different degree, effects of compression as well as spectral and temporal resolution in auditory processing. The experiments include intensity discrimination with pure tones and broadband noise, tone-in-noise detection, spectral masking with narrow-band signals and maskers, forward masking with tone signals and tone or noise maskers, and amplitude-modulation detection with narrow- and wideband noise carriers. The model can account for most of the key properties of the data and is more powerful than the original model. The model might be useful as a front end in technical applications.
Journal of the Acoustical Society of America | 2013
Søren Jørgensen; Stephan D. Ewert; Torsten Dau
The speech-based envelope power spectrum model (sEPSM) presented by Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] estimates the envelope power signal-to-noise ratio (SNRenv) after modulation-frequency selective processing. Changes in this metric were shown to account well for changes of speech intelligibility for normal-hearing listeners in conditions with additive stationary noise, reverberation, and nonlinear processing with spectral subtraction. In the latter condition, the standardized speech transmission index [(2003). IEC 60268-16] fails. However, the sEPSM is limited to conditions with stationary interferers, due to the long-term integration of the envelope power, and cannot account for increased intelligibility typically obtained with fluctuating maskers. Here, a multi-resolution version of the sEPSM is presented where the SNRenv is estimated in temporal segments with a modulation-filter dependent duration. The multi-resolution sEPSM is demonstrated to account for intelligibility obtained in conditions with stationary and fluctuating interferers, and noisy speech distorted by reverberation or spectral subtraction. The results support the hypothesis that the SNRenv is a powerful objective metric for speech intelligibility prediction.
Speech Communication | 2011
Mathias Dietz; Stephan D. Ewert; Volker Hohmann
Humans show a very robust ability to localize sounds in adverse conditions. Computational models of binaural sound localization and technical approaches of direction-of-arrival (DOA) estimation also show good performance, however, both their binaural feature extraction and the strategies for further analysis partly differ from what is currently known about the human auditory system. This study investigates auditory model based DOA estimation emphasizing known features and limitations of the auditory binaural processing such as (i) high temporal resolution, (ii) restricted frequency range to exploit temporal fine-structure, (iii) use of temporal envelope disparities, and (iv) a limited range to compensate for interaural time delay. DOA estimation performance was investigated for up to five concurrent speakers in free field and for up to three speakers in the presence of noise. The DOA errors in these conditions were always smaller than 5^o. A condition with moving speakers was also tested and up to three moving speakers could be tracked simultaneously. Analysis of DOA performance as a function of the binaural temporal resolution showed that short time constants of about 5ms employed by the auditory model were crucial for robustness against concurrent sources.
Journal of the Acoustical Society of America | 2011
Martin Klein-Hennig; Mathias Dietz; Volker Hohmann; Stephan D. Ewert
The auditory system is sensitive to interaural timing disparities in the fine structure and the envelope of sounds, each contributing important cues for lateralization. In this study, psychophysical measurements were conducted with customized envelope waveforms in order to investigate the isolated effect of different segments of a periodic, ongoing envelope on lateralization. One envelope cycle was composed of the four segments attack flank, hold duration, decay flank, and pause duration, which were independently varied to customize the envelope waveform. The envelope waveforms were applied to a 4-kHz sinusoidal carrier, and just noticeable envelope interaural time differences were measured in six normal hearing subjects. The results indicate that attack durations and pause durations prior to the attack are the most important stimulus characteristics for processing envelope timing disparities. The results were compared to predictions of three binaural lateralization models based on the normalized cross correlation coefficient. Two of the models included an additional stage to mimic neural adaptation prior to binaural interaction, involving either a single short time constant (5 ms) or a combination of five time constants up to 500 ms. It was shown that the model with the single short time constant accounted best for the data.
Journal of the Acoustical Society of America | 2008
Mathias Dietz; Stephan D. Ewert; Volker Hohmann
Psychoacoustic experiments were conducted to investigate the role and interaction of fine-structure and envelope-based interaural temporal disparities. A computational model for the lateralization of binaural stimuli, motivated by recent physiological findings, is suggested and evaluated against the psychoacoustic data. The model is based on the independent extraction of the interaural phase difference (IPD) from the stimulus fine-structure and envelope. Sinusoidally amplitude-modulated 1-kHz tones were used in the experiments. The lateralization from either carrier (fine-structure) or modulator (envelope) IPD was matched with an interaural level difference, revealing a nearly linear dependence for both IPD types up to 135 degrees , independent of the modulation frequency. However, if a carrier IPD was traded with an opposed modulator IPD to produce a centered sound image, a carrier IPD of 45 degrees required the largest opposed modulator IPD. The data could be modeled assuming a population of binaural neurons with a physiological distribution of the best IPDs clustered around 45 degrees -50 degrees . The model was also used to predict the perceived lateralization of previously published data. Subject-dependent differences in the perceptual salience of fine-structure and envelope cues, also reported previously, could be modeled by individual weighting coefficients for the two cues.
Journal of the Acoustical Society of America | 2007
Ulrike Dicke; Stephan D. Ewert; Torsten Dau; Birger Kollmeier
Periodic amplitude modulations (AMs) of an acoustic stimulus are presumed to be encoded in temporal activity patterns of neurons in the cochlear nucleus. Physiological recordings indicate that this temporal AM code is transformed into a rate-based periodicity code along the ascending auditory pathway. The present study suggests a neural circuit for the transformation from the temporal to the rate-based code. Due to the neural connectivity of the circuit, bandpass shaped rate modulation transfer functions are obtained that correspond to recorded functions of inferior colliculus (IC) neurons. In contrast to previous modeling studies, the present circuit does not employ a continuously changing temporal parameter to obtain different best modulation frequencies (BMFs) of the IC bandpass units. Instead, different BMFs are yielded from varying the number of input units projecting onto different bandpass units. In order to investigate the compatibility of the neural circuit with a linear modulation filterbank analysis as proposed in psychophysical studies, complex stimuli such as tones modulated by the sum of two sinusoids, narrowband noise, and iterated rippled noise were processed by the model. The model accounts for the encoding of AM depth over a large dynamic range and for modulation frequency selective processing of complex sounds.
Brain Research | 2008
Mathias Dietz; Stephan D. Ewert; Volker Hohmann; Birger Kollmeier
A model of the effective processing of interaural timing disparities in the human auditory system is presented which provides modifications and extensions to existing models motivated by recent physiological findings. In particular, an established model of excitatory-inhibitory (EI) neuronal connectivity is complemented by a model that is based on a rate code derived from the interaural phase difference (IPD). The IPD model is shown to successfully simulate literature data on fine structure and envelope-based binaural detection and lateralization experiments. In order to investigate the processing of temporal fluctuations of interaural timing disparities, detection thresholds of broadband binaural-beat stimuli were measured in six normal-hearing listeners and were compared with model simulations. In a first experiment, the highest detectable beat frequency was found to be 96 Hz for a noise bandwidth of 550 Hz and 219 Hz for a bandwidth of 1100 Hz. Both models predicted lower thresholds, but performed increasingly better when the integration time constants of the binaural processors were reduced. In a second experiment, the signal-to-noise ratio at the detection threshold of binaural-beat stimuli mixed with interaurally uncorrelated noise was measured as a function of the beat frequency. The threshold increased about 1.7 dB per octave which was simulated similarly by both models. The results indicate that the primary temporal resolution of the binaural system for detecting interaural timing disparities is much higher than the temporal resolution found in higher auditory processes as supposedly involved in, e.g., masking.
Journal of the Acoustical Society of America | 2005
Christian Füllgrabe; Brian C. J. Moore; Laurent Demany; Stephan D. Ewert; Stanley Sheft; Christian Lorenzi
Recent studies suggest that an auditory nonlinearity converts second-order sinusoidal amplitude modulation (SAM) (i.e., modulation of SAM depth) into a first-order SAM component, which contributes to the perception of second-order SAM. However, conversion may also occur in other ways such as cochlear filtering. The present experiments explored the source of the first-order SAM component by investigating the ability to detect a 5-Hz, first-order SAM probe in the presence of a second-order SAM masker beating at the probe frequency. Detection performance was measured as a function of masker-carrier modulation frequency, phase relationship between the probe and masker modulator, and probe modulation depth. In experiment 1, the carrier was a 5-kHz sinusoid presented either alone or within a notched-noise masker in order to restrict off-frequency listening. In experiment 2, the carrier was a white noise. The data obtained in both carrier conditions are consistent with the existence of a modulation distortion component. However, the phase yielding poorest detection performance varied across experimental conditions between 0 degrees and 180 degrees, confirming that, in addition to nonlinear mechanisms, cochlear filtering and off-frequency listening play a role in second-order SAM perception. The estimated magnitude of the modulation distortion component ranges from 5%-12%.
Hearing Research | 2014
Dirk Oetting; Thomas Brand; Stephan D. Ewert
Individual loudness perception can be assessed using categorical loudness scaling (CLS). The procedure does not require any training and is frequently used in clinics. The goal of this study was to investigate different methods of loudness-function estimation from CLS data in terms of their test-retest behaviour and to suggest an improved method compared to Brand and Hohmann (2002) for adaptive CLS. Four different runs of the CLS procedure were conducted using 13 normal-hearing and 11 hearing-impaired listeners. The following approaches for loudness-function estimation (fitting) by minimising the error between the data and loudness function were compared: Errors were defined both in level and in loudness direction, respectively. The hearing threshold level (HTL) was extracted from CLS by splitting the responses into an audible and an inaudible category. The extracted HTL was used as a fixed starting point of the loudness function. The uncomfortable loudness level (UCL) was estimated if presentation levels were not sufficiently high to yield responses in the upper loudness range, as often observed in practise. Compared to the original fitting method, the modified estimation of the HTL was closer to the pure-tone audiometric threshold. Results of a computer simulation for UCL estimation showed that the estimation error was reduced for data sets with sparse or absent responses in the upper loudness range. Overall, the suggested modifications lead to a better test-retest behaviour. If CLS data are highly consistent over the whole loudness range, all fitting methods lead to almost equal loudness functions. A considerable advantage of the suggested fitting method is observed for data sets where the responses either show high standard deviations or where responses are not present in the upper loudness range. Both cases regularly occur in clinical practice.
Hearing Research | 2016
Steffen Kortlang; Manfred Mauermann; Stephan D. Ewert
People with sensorineural hearing loss generally suffer from a reduced ability to understand speech in complex acoustic listening situations, particularly when background noise is present. In addition to the loss of audibility, a mixture of suprathreshold processing deficits is possibly involved, like altered basilar membrane compression and related changes, as well as a reduced ability of temporal coding. A series of 6 monaural psychoacoustic experiments at 0.5, 2, and 6 kHz was conducted with 18 subjects, divided equally into groups of young normal-hearing, older normal-hearing and older hearing-impaired listeners, aiming at disentangling the effects of age and hearing loss on psychoacoustic performance in noise. Random frequency modulation detection thresholds (RFMDTs) with a low-rate modulator in wide-band noise, and discrimination of a phase-jittered Schroeder-phase from a random-phase harmonic tone complex are suggested to characterize the individual ability of temporal processing. The outcome was compared to thresholds of pure tones and narrow-band noise, loudness growth functions, auditory filter bandwidths, and tone-in-noise detection thresholds. At 500 Hz, results suggest a contribution of temporal fine structure (TFS) to pure-tone detection thresholds. Significant correlation with auditory thresholds and filter bandwidths indicated an impact of frequency selectivity on TFS usability in wide-band noise. When controlling for the effect of threshold sensitivity, the listeners age significantly correlated with tone-in-noise detection and RFMDTs in noise at 500 Hz, showing that older listeners were particularly affected by background noise at low carrier frequencies.