Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jean-Luc Schwartz is active.

Publication


Featured researches published by Jean-Luc Schwartz.


Cognition | 2004

Seeing to hear better: evidence for early audio-visual interactions in speech identification.

Jean-Luc Schwartz; Frédéric Berthommier; Christophe Savariaux

Lip reading is the ability to partially understand speech by looking at the speakers lips. It improves the intelligibility of speech in noise when audio-visual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speakers lips also enhances sensitivity to acoustic information, decreasing the auditory detection threshold of speech embedded in noise [J. Acoust. Soc. Am. 109 (2001) 2272; J. Acoust. Soc. Am. 108 (2000) 1197]. However, detection is different from comprehension, and it remains to be seen whether improved sensitivity also results in an intelligibility gain in audio-visual speech perception. In this work, we use an original paradigm to show that seeing the speakers lips enables the listener to hear better and hence to understand better. The audio-visual stimuli used here could not be differentiated by lip reading per se since they contained exactly the same lip gesture matched with different compatible speech sounds. Nevertheless, the noise-masked stimuli were more intelligible in the audio-visual condition than in the audio-only condition due to the contribution of visual information to the extraction of acoustic cues. Replacing the lip gesture by a non-speech visual input with exactly the same time course, providing the same temporal cues for extraction, removed the intelligibility benefit. This early contribution to audio-visual speech identification is discussed in relationships with recent neurophysiological data on audio-visual perception.


Journal of Neurolinguistics | 2012

The Perception for Action Control Theory (PACT): a perceptuo-motor theory of speech perception

Jean-Luc Schwartz; Anahita Basirat; Lucie Ménard; Marc Sato

It is an old-standing debate in the field of speech communication to determine whether speech perception involves auditory or multisensory representations and processing, independently on any procedural knowledge about the production of speech units or on the contrary if it is based on a recoding of the sensory input in terms of articulatory gestures, as posited in the Motor Theory of Speech Perception. The discovery of mirror neurons in the last 15 years has strongly renewed the interest for motor theories. However, while these neurophysiological data clearly reinforce the plausibility of the role of motor properties in perception, it could lead in our view to incorrectly de-emphasise the role of perceptual shaping, crucial in speech communication. The so-called Perception-for-Action-Control Theory (PACT) aims at defining a theoretical framework connecting in a principled way perceptual shaping and motor procedural knowledge in speech multisensory processing in the human brain. In this paper, the theory is presented in details. It is described how this theory fits with behavioural and linguistic data, concerning firstly vowel systems in human languages, and secondly the perceptual organization of the speech scene. Finally a neuro-computational framework is presented in connection with recent data on the possible functional role of the motor system in speech perception.


Journal of the Acoustical Society of America | 2001

Audio-visual enhancement of speech in noise.

Laurent Girin; Jean-Luc Schwartz; Gang Feng

A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach--that is, the processing of the audio corrupted signal using audio information (from the corrupted signal only or additive audio information). In this paper, an audio-visual approach to the problem is considered, since it has been demonstrated in several studies that viewing the speakers face improves message intelligibility, especially in noisy environments. A speech enhancement prototype system that takes advantage of visual inputs is developed. A filtering process approach is proposed that uses enhancement filters estimated with the help of lip shape information. The estimation process is based on linear regression or simple neural networks using a training corpus. A set of experiments assessed by Gaussian classification and perceptual tests demonstrates that it is indeed possible to enhance simple stimuli (vowel-plosive-vowel sequences) embedded in white Gaussian noise.


IEEE Transactions on Speech and Audio Processing | 1999

Comparing models for audiovisual fusion in a noisy-vowel recognition task

Pascal Teissier; Jordi Robert-Ribes; Jean-Luc Schwartz; Anne Guérin-Dugué

Audiovisual speech recognition involves fusion of the audio and video sensors for phonetic identification. There are three basic ways to fuse data streams for taking a decision such as phoneme identification: data-to-decision, decision-to-decision, and data-to-data. This leads to four possible models for audiovisual speech recognition, that is direct identification in the first case, separate identification in the second one, and two variants of the third early integration case, namely dominant recoding or motor recoding. However, no systematic comparison of these models is available in the literature. We propose an implementation of these four models, and submit them to a benchmark test. For this aim, we use a noisy-vowel corpus tested on two recognition paradigms in which the systems are tested at noise levels higher than those used for learning. In one of these paradigms, the signal-to-noise ratio (SNR) value is provided to the recognition systems, in the other it is not. We also introduce a new criterion for evaluating performances, based on transmitted information on individual phonetic features. In light of the compared performances of the four models with the two recognition paradigms, we discuss the advantages and drawbacks of these models, leading to proposals for data representation, fusion architecture, and control of the fusion process through sensor reliability.


Human Brain Mapping | 2012

Functional MRI assessment of orofacial articulators: Neural correlates of lip, jaw, larynx, and tongue movements

Krystyna Grabski; Laurent Lamalle; Coriandre Vilain; Jean-Luc Schwartz; Nathalie Vallée; Irène Troprès; Monica Baciu; Jean François Le Bas; Marc Sato

Compared with complex coordinated orofacial actions, few neuroimaging studies have attempted to determine the shared and distinct neural substrates of supralaryngeal and laryngeal articulatory movements when performed independently. To determine cortical and subcortical regions associated with supralaryngeal motor control, participants produced lip, tongue and jaw movements while undergoing functional magnetic resonance imaging (fMRI). For laryngeal motor activity, participants produced the steady‐state/i/vowel. A sparse temporal sampling acquisition method was used to minimize movement‐related artifacts. Three main findings were observed. First, the four tasks activated a set of largely overlapping, common brain areas: the sensorimotor and premotor cortices, the right inferior frontal gyrus, the supplementary motor area, the left parietal operculum and the adjacent inferior parietal lobule, the basal ganglia and the cerebellum. Second, differences between tasks were restricted to the bilateral auditory cortices and to the left ventrolateral sensorimotor cortex, with greater signal intensity for vowel vocalization. Finally, a dorso‐ventral somatotopic organization of lip, jaw, vocalic/laryngeal, and tongue movements was observed within the primary motor and somatosensory cortices using individual region‐of‐interest (ROI) analyses. These results provide evidence for a core neural network involved in laryngeal and supralaryngeal motor control and further refine the sensorimotor somatotopic organization of orofacial articulators. Hum Brain Mapp 33:2306–2321, 2012.


Philosophical Transactions of the Royal Society B | 2012

Multistability in perception: binding sensory modalities, an overview

Jean-Luc Schwartz; Nicolas Grimault; Jean-Michel Hupé; Brian C. J. Moore; Daniel Pressnitzer

This special issue presents research concerning multistable perception in different sensory modalities. Multistability occurs when a single physical stimulus produces alternations between different subjective percepts. Multistability was first described for vision, where it occurs, for example, when different stimuli are presented to the two eyes or for certain ambiguous figures. It has since been described for other sensory modalities, including audition, touch and olfaction. The key features of multistability are: (i) stimuli have more than one plausible perceptual organization; (ii) these organizations are not compatible with each other. We argue here that most if not all cases of multistability are based on competition in selecting and binding stimulus information. Binding refers to the process whereby the different attributes of objects in the environment, as represented in the sensory array, are bound together within our perceptual systems, to provide a coherent interpretation of the world around us. We argue that multistability can be used as a method for studying binding processes within and across sensory modalities. We emphasize this theme while presenting an outline of the papers in this issue. We end with some thoughts about open directions and avenues for further research.


Journal of the Acoustical Society of America | 2002

Auditory normalization of French vowels synthesized by an articulatory model simulating growth from birth to adulthood

Lucie Ménard; Jean-Luc Schwartz; Louis-Jean Boë; Sonia Kandel; Nathalie Vallée

The present article aims at exploring the invariant parameters involved in the perceptual normalization of French vowels. A set of 490 stimuli, including the ten French vowels /i y u e ø o E oe (inverted c) a/ produced by an articulatory model, simulating seven growth stages and seven fundamental frequency values, has been submitted as a perceptual identification test to 43 subjects. The results confirm the important effect of the tonality distance between F1 and f0 in perceived height. It does not seem, however, that height perception involves a binary organization determined by the 3-3.5-Bark critical distance. Regarding place of articulation, the tonotopic distance between F1 and F2 appears to be the best predictor of the perceived front-back dimension. Nevertheless, the role of the difference between F2 and F3 remains important. Roundedness is also examined and correlated to the effective second formant, involving spectral integration of higher formants within the 3.5-Bark critical distance. The results shed light on the issue of perceptual invariance, and can be interpreted as perceptual constraints imposed on speech production.


international conference on acoustics, speech, and signal processing | 2006

An Analysis of Visual Speech Information Applied to Voice Activity Detection

David Sodoyer; Bertrand Rivet; Laurent Girin; Jean-Luc Schwartz; Christian Jutten

We present a new approach to the voice activity detection (VAD) problem for speech signals embedded in non-stationary noise. The method is based on automatic lipreading: the objective is to detect voice activity or non-activity by exploiting the coherence between the speech acoustic signal and the speakers lip movements. From a comprehensive analysis of lip shape parameters during speech and non-speech events, we show that a single appropriate visual parameter, defined to characterize the lip movements, can be used for the detection of sections of voice activity or more precisely, for the detection of silence sections. Detection scores obtained on spontaneous speech confirm the efficiency of the visual voice activity detector (VVAD)


Speech Communication | 2004

Visual perception of contrastive focus in reiterant French speech

Marion Dohen; Hélène Loevenbruck; Marie-Agnès Cathiard; Jean-Luc Schwartz

The aim of this paper is to study how contrastive focus is conveyed by prosody both articulatorily and acoustically and how viewers extract focus structure from visual prosodic realizations. Is the visual modality useful for the perception of prosody? An audiovisual corpus was recorded from a male native speaker of French. The sentences had a subject-verb-object (SVO) structure. Four contrastive focus conditions were studied: focus on each phrase (S, V or O) and broad focus. Normal and reiterant modes were recorded, only the latter was studied. An acoustic validation (fundamental frequency, duration and intensity) showed that the speaker had pronounced the utterances with a typical focused intonation on the focused phrase. Then, lip height and jaw opening were extracted from the video data. An articulatory analysis suggested a set of possible visual cues to focus for rei\terant /ma/ speech: (a) prefocal lengthening, (b) large jaw opening and high opening velocities on all the focused syllables; (c) long lip closure for the first focused syllable and (d) hypo-articulation (reduced jaw opening and duration) of the following phrases. A visual perception test was developed. It showed that (a) contrastive focus was well perceived visually for reiterant speech; (b) no training was necessary and (c) subject focus was slightly easier to identify than the other focus conditions. We also found that if the visual cues identified in our articulatory analysis were present and marked, perception was enhanced. This enables us to assume that the visual cues extracted from the corpus are probably the ones which are indeed perceptively salient.


Speech Communication | 2004

Developing an audio-visual speech source separation algorithm

David Sodoyer; Laurent Girin; Christian Jutten; Jean-Luc Schwartz

Abstract Looking at the speaker’s face is useful to hear better a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audio-visual coherence of speech stimuli. In this paper, a novel algorithm plugging audio-visual coherence estimated by statistical tools on classical blind source separation algorithms is presented, and its assessment is described. We show, in the case of additive mixtures, that this algorithm performs better than classical blind tools both when there are as many sensors as sources, and when there are less sensors than sources. Audio-visual coherence enables a focus on the speech source to extract. It may also be used at the output of a classical source separation algorithm, to select the “best” sensor with reference to a target source.

Collaboration


Dive into the Jean-Luc Schwartz's collaboration.

Top Co-Authors

Avatar

Marc Sato

University of Grenoble

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Louis-Jean Boë

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Julien Diard

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Christophe Savariaux

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Amélie Rochet-Capellan

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pascal Perrier

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Pierre Badin

Centre national de la recherche scientifique

View shared research outputs
Researchain Logo
Decentralizing Knowledge