Martijn Baart
Tilburg University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Martijn Baart.
Neuropsychologia | 2014
Martijn Baart; Jeroen J. Stekelenburg; Jean Vroomen
Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were modulated by phonetic audiovisual congruency. In order to disentangle speech-specific (phonetic) integration from non-speech integration, we used Sine-Wave Speech (SWS) that was perceived as speech by half of the participants (they were in speech-mode), while the other half was in non-speech mode. Results showed that the N1 obtained with audiovisual stimuli peaked earlier than the N1 evoked by auditory-only stimuli. This lip-read induced speeding up of the N1 occurred for listeners in speech and non-speech mode. In contrast, if listeners were in speech-mode, lip-read speech also modulated the auditory P2, but not if listeners were in non-speech mode, thus revealing speech-specific audiovisual binding. Comparing ERPs for phonetically congruent audiovisual stimuli with ERPs for incongruent stimuli revealed an effect of phonetic stimulus congruency that started at ~200 ms after (in)congruence became apparent. Critically, akin to the P2 suppression, congruency effects were only observed if listeners were in speech mode, and not if they were in non-speech mode. Using identical stimuli, we thus confirm that audiovisual binding involves (partially) different neural mechanisms for sound processing in speech and non-speech mode.
Cognition | 2009
Jean Vroomen; Martijn Baart
Upon hearing an ambiguous speech sound dubbed onto lipread speech, listeners adjust their phonetic categories in accordance with the lipread information (recalibration) that tells what the phoneme should be. Here we used sine wave speech (SWS) to show that this tuning effect occurs if the SWS sounds are perceived as speech, but not if the sounds are perceived as non-speech. In contrast, selective speech adaptation occurred irrespective of whether listeners were in speech or non-speech mode. These results provide new evidence for the distinction between a speech and non-speech processing mode, and they demonstrate that different mechanisms underlie recalibration and selective speech adaptation.
Neuroscience Letters | 2010
Martijn Baart; Jean Vroomen
It is well known that visual information derived from mouth movements (i.e., lipreading) can have profound effects on auditory speech identification (e.g. the McGurk-effect). Here we examined the reverse phenomenon, namely whether auditory speech affects lipreading. We report that speech sounds dubbed onto lipread speech affect immediate identification of lipread tokens. This effect likely reflects genuine cross-modal integration of sensory signals and not just a simple response bias because we also observed adaptive shifts in visual identification of the ambiguous lipread tokens after exposure to incongruent audiovisual adapter stimuli. Presumably, listeners had learned to label the lipread stimulus in accordance with the sound, thus demonstrating that the interaction between hearing and lipreading is genuinely bi-directional.
Cognition | 2014
Martijn Baart; Jean Vroomen; Kathleen Shaw; Heather Bortfeld
Infants and adults are well able to match auditory and visual speech, but the cues on which they rely (viz. temporal, phonetic and energetic correspondence in the auditory and visual speech streams) may differ. Here we assessed the relative contribution of the different cues using sine-wave speech (SWS). Adults (N=52) and infants (N=34, age ranged in between 5 and 15months) matched 2 trisyllabic speech sounds (kalisu and mufapi), either natural or SWS, with visual speech information. On each trial, adults saw two articulating faces and matched a sound to one of these, while infants were presented the same stimuli in a preferential looking paradigm. Adults performance was almost flawless with natural speech, but was significantly less accurate with SWS. In contrast, infants matched the sound to the articulating face equally well for natural speech and SWS. These results suggest that infants rely to a lesser extent on phonetic cues than adults do to match audio to visual speech. This is in line with the notion that the ability to extract phonetic information from the visual signal increases during development, and suggests that phonetic knowledge might not be the basis for early audiovisual correspondence detection in speech.
Language and Speech | 2009
Jean Vroomen; Martijn Baart
Listeners hearing an ambiguous speech sound flexibly adjust their phonetic categories in accordance with lipread information telling what the phoneme should be (recalibration). Here, we tested the stability of lipread-induced recalibration over time. Listeners were exposed to an ambiguous sound halfway between /t/ and /p/ that was dubbed onto a face articulating either /t/ or /p/. When tested immediately, listeners exposed to lipread /t/ were more likely to categorize the ambiguous sound as /t/ than listeners exposed to /p/. This aftereffect dissipated quickly with prolonged testing and did not reappear after a 24-hour delay. Recalibration of phonetic categories is thus a fragile phenomenon.
Experimental Brain Research | 2010
Martijn Baart; Jean Vroomen
Listeners use lipread information to adjust the phonetic boundary between two speech categories (phonetic recalibration, Bertelson et al. 2003). Here, we examined phonetic recalibration while listeners were engaged in a visuospatial or verbal memory working memory task under different memory load conditions. Phonetic recalibration was—like selective speech adaptation—not affected by a concurrent verbal or visuospatial memory task. This result indicates that phonetic recalibration is a low-level process not critically depending on processes used in verbal- or visuospatial working memory.
Journal of Experimental Child Psychology | 2015
Martijn Baart; Heather Bortfeld; Jean Vroomen
The correspondence between auditory speech and lip-read information can be detected based on a combination of temporal and phonetic cross-modal cues. Here, we determined the point in developmental time at which children start to effectively use phonetic information to match a speech sound with one of two articulating faces. We presented 4- to 11-year-olds (N=77) with three-syllabic sine-wave speech replicas of two pseudo-words that were perceived as non-speech and asked them to match the sounds with the corresponding lip-read video. At first, children had no phonetic knowledge about the sounds, and matching was thus based on the temporal cues that are fully retained in sine-wave speech. Next, we trained all children to perceive the phonetic identity of the sine-wave speech and repeated the audiovisual (AV) matching task. Only at around 6.5 years of age did the benefit of having phonetic knowledge about the stimuli become apparent, thereby indicating that AV matching based on phonetic cues presumably develops more slowly than AV matching based on temporal cues.
Psychophysiology | 2016
Martijn Baart
Lip-read speech suppresses and speeds up the auditory N1 and P2 peaks, but these effects are not always observed or reported. Here, the robustness of lip-read-induced N1/P2 suppression and facilitation in phonetically congruent audiovisual speech was assessed by analyzing peak values that were taken from published plots and individual data. To determine whether adhering to the additive model of AV integration (i.e., A+Vu2009≠u2009AV, or AV-Vu2009≠u2009A) is critical for correct characterization of lip-read-induced effects on the N1 and P2, auditory data was compared to AV and to AV-V. On average, the N1 and P2 were consistently suppressed and sped up by lip-read information, with no indication that AV integration effects were significantly modulated by whether or not V was subtracted from AV. To assess the possibility that variability in observed N1/P2 amplitudes and latencies may explain why N1/P2 suppression and facilitation are not always found, additional correlations between peak values and size of the AV integration effects were computed. These analyses showed that N1/P2 peak values correlated with the size of AV integration effects. However, it also became apparent that a portion of the AV integration effects was characterized by lip-read-induced peak enhancements and delays rather than suppressions and facilitations, which, for the individual data, seemed related to particularly small/early A-only peaks and large/late AV(-V) peaks.
Neuroscience Letters | 2015
Martijn Baart; Arthur G. Samuel
Auditory lexical processing starts within 200 ms after onset of the critical stimulus. Here, we used electroencephalography (EEG) to investigate whether (1) the so-called N200 effect can be triggered by single-item lexical context, and (2) such effects are robust against temporal violations of the signal. We presented items in which lexical status (i.e., is the stimulus a word or a pseudoword?) was determined at third syllable onset. The critical syllable could be naturally timed or delayed (by ∼440 or ∼800 ms). Across all conditions, we observed an effect of lexicality that started ∼200 ms after third syllable onset (i.e., an N200 effect in naturally timed items and a similar effect superimposed on the P2 for the delayed items). The results indicate that early lexical processes are robust against violations of temporal coherence.
PLOS ONE | 2015
Kathleen Shaw; Martijn Baart; Nicole Depowski; Heather Bortfeld
Although infant speech perception in often studied in isolated modalities, infants experience with speech is largely multimodal (i.e., speech sounds they hear are accompanied by articulating faces). Across two experiments, we tested infants’ sensitivity to the relationship between the auditory and visual components of audiovisual speech in their native (English) and non-native (Spanish) language. In Experiment 1, infants’ looking times were measured during a preferential looking task in which they saw two simultaneous visual speech streams articulating a story, one in English and the other in Spanish, while they heard either the English or the Spanish version of the story. In Experiment 2, looking times from another group of infants were measured as they watched single displays of congruent and incongruent combinations of English and Spanish audio and visual speech streams. Findings demonstrated an age-related increase in looking towards the native relative to non-native visual speech stream when accompanied by the corresponding (native) auditory speech. This increase in native language preference did not appear to be driven by a difference in preference for native vs. non-native audiovisual congruence as we observed no difference in looking times at the audiovisual streams in Experiment 2.