Sljde Steven van de Par
Philips
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sljde Steven van de Par.
EURASIP Journal on Advances in Signal Processing | 2005
Dj Jeroen Breebaart; Sljde Steven van de Par; Ag Armin Kohlrausch; Erik Gosuinus Petrus Schuijers
Parametric-stereo coding is a technique to efficiently code a stereo audio signal as a monaural signal plus a small amount of parametric overhead to describe the stereo image. The stereo properties are analyzed, encoded, and reinstated in a decoder according to spatial psychoacoustical principles. The monaural signal can be encoded using any (conventional) audio coder. Experiments show that the parameterized description of spatial properties enables a highly efficient, high-quality stereo audio representation.
Journal of the Acoustical Society of America | 2001
Dj Jeroen Breebaart; Sljde Steven van de Par; Ag Armin Kohlrausch
This article presents a quantitative binaural signal detection model which extends the monaural model described by Dau et al. [J. Acoust. Soc. Am. 99, 3615-3622 (1996)]. The model is divided into three stages. The first stage comprises peripheral preprocessing in the right and left monaural channels. The second stage is a binaural processor which produces a time-dependent internal representation of the binaurally presented stimuli. This stage is based on the Jeffress delay line extended with tapped attenuator lines. Through this extension, the internal representation codes both interaural time and intensity differences. In contrast to most present-day models, which are based on excitatory-excitatory interaction, the binaural interaction in the present model is based on contralateral inhibition of ipsilateral signals. The last stage, a central processor, extracts a decision variable that can be used to detect the presence of a signal in a detection task, but could also derive information about the position and the compactness of a sound source. In two accompanying articles, the model predictions are compared with data obtained with human observers in a great variety of experimental conditions.
Attention Perception & Psychophysics | 2008
Rlj Rob van Eijk; Ag Armin Kohlrausch; James F. Juola; Sljde Steven van de Par
When an audio—visual event is perceived in the natural environment, a physical delay will always occur between the arrival of the leading visual component and that of the trailing auditory component. This natural timing relationship suggests that the point of subjective simultaneity (PSS) should occur at an auditory delay greater than or equal to 0 msec. A review of the literature suggests that PSS estimates derived from a temporal order judgment (TOJ) task differ from those derived from a synchrony judgment (SJ) task, with (unnatural) auditory-leading PSS values reported mainly for the TOJ task. We report data from two stimulus types that differed in terms of complexity— namely, (1) a flash and a click and (2) a bouncing ball and an impact sound. The same participants judged the temporal order and synchrony of both stimulus types, using three experimental methods: (1) a TOJ task with two response categories (“audio first” or “video first”), (2) an SJ task with two response categories (“synchronous” or “asynchronous”; SJ2), and (3) an SJ task with three response categories (“audio first,” “synchronous,” or “video first”; SJ3). Both stimulus types produced correlated PSS estimates with the SJ tasks, but the estimates from the TOJ procedure were uncorrelated with those obtained from the SJ tasks. These results suggest that the SJ task should be preferred over the TOJ task when the primary interest is in perceived audio—visual synchrony.
Journal of the Acoustical Society of America | 1997
Sljde Steven van de Par; Ag Armin Kohlrausch
A new experimental technique for studying binaural processing at high frequencies is introduced. Binaural masking level differences (BMLDs) for the conditions N0S pi and N pi S0 were measured for a tonal signal in narrow-band noise at 125, 250, and 4000 Hz. In addition, transposed stimuli were generated, which were centered at 4000 Hz, but were designed to preserve within the envelope the temporal fine-structure information available at the two lower frequencies. The BMLDs measured with the 125-Hz transposed stimuli were essentially the same as BMLDs from the regular 125-Hz condition. The transposed 250-Hz stimuli generally produced smaller BMLDs than the stimuli centered at 250 Hz, but the pattern of results as a function of masker bandwidth was the same. The patterns of results from the transposed stimuli are different from those of the 4000-Hz condition and, consistent with the low-frequency masker data, generally show higher BMLDs. The results indicate that the mechanisms underlying binaural processing at low and high frequencies are similar, and that frequency-dependent differences in BMLDs probably reflect the inability of the auditory system to encode the temporal fine structure of high-frequency stimuli.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Tobias May; Sljde Steven van de Par; Ag Armin Kohlrausch
Although extensive research has been done in the field of machine-based localization, the degrading effect of reverberation and the presence of multiple sources on localization performance has remained a major problem. Motivated by the ability of the human auditory system to robustly analyze complex acoustic scenes, the associated peripheral stage is used in this paper as a front-end to estimate the azimuth of sound sources based on binaural signals. One classical approach to localize an acoustic source in the horizontal plane is to estimate the interaural time difference (ITD) between both ears by searching for the maximum in the cross-correlation function. Apart from ITDs, the interaural level difference (ILD) can contribute to localization, especially at higher frequencies where the wavelength becomes smaller than the diameter of the head, leading to ambiguous ITD information. The interdependency of ITD and ILD on azimuth is a complex pattern that depends also on the room acoustics, and is therefore learned by azimuth-dependent Gaussian mixture models (GMMs). Multiconditional training is performed to take into account the variability of the binaural features which results from multiple sources and the effect of reverberation. The proposed localization model outperforms state-of-the-art localization techniques in simulated adverse acoustic conditions.
Journal of the Acoustical Society of America | 2001
Dj Jeroen Breebaart; Sljde Steven van de Par; Ag Armin Kohlrausch
This paper and two accompanying papers [Breebaart et al., J. Acoust. Soc. Am. 110, 1074-1088 (2001); 110, 1089-1104 (2001)] describe a computational model for the signal processing of the binaural auditory system. The model consists of several stages of monaural and binaural preprocessing combined with an optimal detector. Simulations of binaural masking experiments were performed as a function of temporal stimulus parameters and compared to psychophysical data adapted from literature. For this purpose, the model was used as an artificial observer in a three-interval, forced-choice procedure. All model parameters were kept constant for all simulations. Model predictions were obtained as a function of the interaural correlation of a masking noise and as a function of both masker and signal duration. Furthermore, maskers with a time-varying interaural correlation were used. Predictions were also obtained for stimuli with time-varying interaural time or intensity differences. Finally, binaural forward-masking conditions were simulated. The results show that the combination of a temporal integrator followed by an optimal detector in the time domain can account for all conditions that were tested, except for those using periodically varying interaural time differences (ITDs) and those measuring interaural correlation just-noticeable differences (jnds) as a function of bandwidth.
Journal of the Acoustical Society of America | 1999
Sljde Steven van de Par; Ag Armin Kohlrausch
Thresholds for sinusoidal signals masked by noise of various bandwidths were obtained for three binaural configurations: N0S0 (both masker and signal interaurally in phase), N0S pi (masker interaurally in phase and signal interaurally phase-reversed), and N pi S0 (masker interaurally phase-reversed and signal interaurally in phase). Signal frequencies of 125, 250, 500, 1000, 2000, and 4000 Hz were combined with masker bandwidths of 5, 10, 25, 50, 100, 250, 500, 1000, 2000, 4000, and 8000 Hz, with the restriction that masker bandwidths never exceeded twice the signal frequency. The overall noise power was kept constant at 70 dB SPL for all bandwidths. Results, expressed as signal-to-total-noise power ratios, show that N0S0 thresholds generally decrease with increasing bandwidth, even for subcritical bandwidths. Only at frequencies of 2 and 4 kHz do thresholds appear to remain constant for bandwidths around the critical bandwidth. N0S pi thresholds are generally less dependent on bandwidth up to two or three times the (monaural) critical bandwidth. Beyond this bandwidth, thresholds decrease with a similar slope as for the N0S0 condition. N pi S0 conditions show about the same bandwidth dependence as N0S pi, but thresholds in the former condition are generally higher. This threshold difference is largest at low frequencies and disappears above 2 kHz. An explanation for wider operational binaural critical bandwidth is given which assumes that binaural disparities are combined across frequency in an optimally weighted way.
Communication acoustics / ed. by Jens Blauert | 2005
Ag Armin Kohlrausch; Sljde Steven van de Par
In our natural environment, we simultaneously receive information through various sensory modalities. The properties of these stimuli are coupled by physical laws, so that, e. g., auditory and visual stimuli caused by the same event have a specific temporal, spatial and contextual relation when reaching the observer. In speech, for example, visible lip movements and audible utterances occur in close synchrony, which contributes to the improvement of speech intelligibility under adverse acoustic conditions. Research into multi-sensory perception is currently being performed in a number of different experimental and application contexts. This chapter provides an overview of the typical research areas dealing with audio—visual interaction3 and integration, bridging the range from cognitive psychology to applied research for multi-media applications. A major part of this chapter deals with a variety of research questions related to the temporal relation between audio and video. Other issues of interest are basic spatio-temporal interaction, spatio-temporal effects in audio—visual stimuli — including the ventriloquist effect, cross-modal effects in attention, audio—visual interaction in speech perception and interaction effects with respect to the perceived quality of audio—visual scenes.
human vision and electronic imaging conference | 1999
Ag Armin Kohlrausch; Sljde Steven van de Par
In our natural environment, we simultaneously receive information through various sensory modalities. The properties of these stimuli are coupled by physical laws, so that, e.g., auditory and visual stimuli caused by the same even have a fixed temporal relation when reaching the observer. In speech, for example, visible lip movements and audible utterances occur in close synchrony which contributes to the improvement of speech intelligibility under adverse acoustic conditions. Research into multi- sensory perception is currently being performed in a great variety of experimental contexts. This paper attempts to give an overview of the typical research areas dealing with audio-visual interaction and integration, bridging the range from cognitive psychology to applied research for multimedia applications. Issues of interest are the sensitivity to asynchrony between audio and video signals, the interaction between audio-visual stimuli with discrepant spatial and temporal rate information, crossmodal effects in attention, audio-visual interactions in speech perception and the combined perceived quality of audio-visual stimuli.
Journal of the Acoustical Society of America | 1998
Sljde Steven van de Par; Ag Armin Kohlrausch
Detection thresholds were measured with a multiplied-noise masker that was in phase in both ears and a sinusoidal signal which was either in phase or out of phase (NoSo and NoS pi conditions). The masker was generated by multiplying a low-pass noise with a sinusoidal carrier. The signal was a sinusoid with the same frequency as the carrier and a constant phase offset, theta, with respect to the carrier. By adjusting the phase offset, the stimulus properties were varied in such a way that only interaural time delays (theta = pi/2) or interaural intensity differences (theta = 0) were present within the NoS pi stimulus. Thresholds were measured at a center frequency of 4 kHz as a function of bandwidth for theta = pi/2 and for theta = 0. In a second experiment thresholds were measured for a bandwidth of 25 Hz as a function of the center frequency. The results show that narrow-band BMLDs at 4 kHz can amount to 30 dB for the theta = 0 condition. For this condition, narrow-band BMLDs are also reasonably constant across frequency, in contrast to results obtained with standard Gaussian-noise maskers. For theta = pi/2, BMLDs are restricted to the frequency region below 2 kHz provided that the masker is narrow band, but BMLDs of up to 15 dB are found at 4 kHz if the masker is 50 Hz or wider. The frequency dependence of the binaural thresholds seems to be best explained by assuming that the stimulus waveforms are compressed before binaural interaction.