Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hervé Glotin is active.

Publication


Featured researches published by Hervé Glotin.


international conference on acoustics, speech, and signal processing | 2001

Weighting schemes for audio-visual fusion in speech recognition

Hervé Glotin; D. Vergyr; Chalapathy Neti; Gerasimos Potamianos; Juergen Luettin

We demonstrate an improvement in the state-of-the-art large vocabulary continuous speech recognition (LVCSR) performance, under clean and noisy conditions, by the use of visual information, in addition to the traditional audio one. We take a decision fusion approach for the audio-visual information, where the single-modality (audio- and visual- only) HMM classifiers are combined to recognize audio-visual speech. More specifically, we tackle the problem of estimating the appropriate combination weights for each of the modalities. Two different techniques are described: the first uses an automatically extracted estimate of the audio stream reliability in order to modify the weights for each modality (both clean and noisy audio results are reported), while the second is a discriminative model combination approach where weights on pre-defined model classes are optimized to minimize WER (clean audio only results).


multimedia signal processing | 2001

Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop

Chalapathy Neti; Gerasimos Potamianos; Juergen Luettin; Iain A. Matthews; Hervé Glotin; Dimitra Vergyri

We report a summary of the Johns Hopkins Summer 2000 Workshop on audio-visual automatic speech recognition (ASR) in the large-vocabulary, continuous speech domain. Two problems of audio-visual ASR were mainly addressed: visual feature extraction and audio-visual information fusion. First, image transform and model-based visual features were considered, obtained by means of the discrete cosine transform (DCT) and active appearance models, respectively. The former were demonstrated to yield superior automatic speech reading. Subsequently, a number of feature fusion and decision fusion techniques for combining the DCT visual features with traditional acoustic ones were implemented and compared. Hierarchical discriminant feature fusion and asynchronous decision fusion by means of the multi-stream hidden Markov model consistently improved ASR for both clean and noisy speech. Compared to an equivalent audio-only recognizer, introducing the visual modality reduced ASR word error rate by 7% relative in clean speech, and by 27% relative at an 8.5 dB SNR audio condition.


conference of the international speech communication association | 2002

Multichannel signal separation for cocktail party speech recognition: a dynamic recurrent network

Seungjin Choi; Heonseok Hong; Hervé Glotin; Frédéric Berthommier

Abstract This paper addresses a method of multichannel signal separation (MSS) with its application to cocktail party speech recognition. First, we present a fundamental principle for multichannel signal separation which uses the spatial independence of located sources as well as the temporal dependence of speech signals. Second, for practical implementation of the signal separation filter, we consider a dynamic recurrent network and develop a simple new learning algorithm. The performance of the proposed method is evaluated in terms of word recognition error rate (WER) in a large speech recognition experiment. The results show that our proposed method dramatically improves the word recognition performance in the case of two simultaneous speech inputs, and that a timing effect is involved in the segregation process.


IEEE Transactions on Image Processing | 2012

Cooperative Sparse Representation in Two Opposite Directions for Semi-Supervised Image Annotation

Zhong-Qiu Zhao; Hervé Glotin; Zhao Xie; Jun Gao; Xindong Wu

Recent studies have shown that sparse representation (SR) can deal well with many computer vision problems, and its kernel version has powerful classification capability. In this paper, we address the application of a cooperative SR in semi-supervised image annotation which can increase the amount of labeled images for further use in training image classifiers. Given a set of labeled (training) images and a set of unlabeled (test) images, the usual SR method, which we call forward SR, is used to represent each unlabeled image with several labeled ones, and then to annotate the unlabeled image according to the annotations of these labeled ones. However, to the best of our knowledge, the SR method in an opposite direction, that we call backward SR to represent each labeled image with several unlabeled images and then to annotate any unlabeled image according to the annotations of the labeled images which the unlabeled image is selected by the backward SR to represent, has not been addressed so far. In this paper, we explore how much the backward SR can contribute to image annotation, and be complementary to the forward SR. The co-training, which has been proved to be a semi-supervised method improving each other only if two classifiers are relatively independent, is then adopted to testify this complementary nature between two SRs in opposite directions. Finally, the co-training of two SRs in kernel space builds a cooperative kernel sparse representation (Co-KSR) method for image annotation. Experimental results and analyses show that two KSRs in opposite directions are complementary, and Co-KSR improves considerably over either of them with an image annotation performance better than other state-of-the-art semi-supervised classifiers such as transductive support vector machine, local and global consistency, and Gaussian fields and harmonic functions. Comparative experiments with a nonsparse solution are also performed to show that the sparsity plays an important role in the cooperation of image representations in two opposite directions. This paper extends the application of SR in image annotation and retrieval.


international workshop on machine learning for signal processing | 2016

Bird detection in audio: A survey and a challenge

Dan Stowell; Michael D. Wood; Yannis Stylianou; Hervé Glotin

Many biological monitoring projects rely on acoustic detection of birds. Despite increasingly large datasets, this detection is often manual or semi-automatic, requiring manual tuning/postprocessing. We review the state of the art in automatic bird sound detection, and identify a widespread need for tuning-free and species-agnostic approaches. We introduce new datasets and an IEEE research challenge to address this need, to make possible the development of fully automatic algorithms for bird sound detection.


Methods in Ecology and Evolution | 2014

Monitoring temporal change of bird communities with dissimilarity acoustic indices

Laurent Lellouch; Sandrine Pavoine; Frédéric Jiguet; Hervé Glotin; Jérôme Sueur

Summary A part of biodiversity assessment and monitoring consists in the estimation and track of the changes in species composition and abundance of animal communities. Such a task requires an important sampling over a broad-scale time that is difficult to reach with classical survey methods. Acoustics may offer an alternative to usual techniques by recording the sound produced by vocal animals. Animal species that use sound for communication (sing and/or call) establish an acoustic community when they sing at the same time and at a particular place. The estimation of the acoustic community dynamics could provide indirect cues on what drives changes in community composition and species abundance. Here, new methods were developed to estimate the changes in bird communities recorded at three woodland temperate sites in France. Both field recordings and simulated data were used to test whether acoustic dissimilarity indices can be used to estimate changes in the composition of the community. Four dissimilarity indices found in the literature, and a new one named Dcf were tested on auditory spectra after transformation to the Mel scale, rather than on classical Fourier frequency spectra. All indices were compared with each other and with compositional indices. The results show that bird communities occurring at the three sites were dynamic with changes of composition with time. Dissimilarities computed on simulated acoustic communities were correlated with compositional dissimilarity but those computed on field-recorded communities could not be considered as faithful estimators of community composition variations. However, the indices indicate important dates in community changes around mid-April that were also seen in the composition dynamics. Acoustic dissimilarity indices failed to track accurately changes in species composition of the bird communities. However, these indices, which are easy to compute, still provide information on the acoustic dynamics of bird community. Acoustics might not be considered as a proxy of compositional diversity but rather as another facet of animal diversity that needs to be studied and preserved on its own.


Multimedia Tools and Applications | 2005

Enhancement of Textual Images Classification Using Segmented Visual Contents for Image Search Engine

Sabrina Tollari; Hervé Glotin; Jacques Le Maitre

This paper deals with the use of the dependencies between the textual indexation of an image (a set of keywords) and its visual indexation (colour and shape features). Experiments are realized on a corpus of photographs of a press agency (EDITING) and on another corpus of animals and landscape photographs (COREL). Both are manually indexed by keywords. Keywords of the news photos are extracted from a hierarchically structured thesaurus. Keywords of Corel corpus are semantically linked using WordNet database. A semantic clustering of the photos is constructed from their textual indexation. We use two different visual segmentation schemes. One is based on areas of interest, the other one on blobs of homogenous colour. Both segmentation schemes are used to evaluate the performance of a content-based image retrieval system combining textual and visual descriptions. Results of visuo-textual classifications show an improvement of 50% against classification using only textual information. Finally, we show how to apply this system in order to enhance a web image search engine. To this purpose, we illustrate a method allowing selecting only accurate images resulting from a textual query.


Journal of Physiology-paris | 2010

An Adaptive Resonance Theory account of the implicit learning of orthographic word forms

Hervé Glotin; P. Warnier; Frédéric Dandurand; Stéphane Dufau; Bernard Lété; Claude Touzet; Johannes C. Ziegler; Jonathan Grainger

An Adaptive Resonance Theory (ART) network was trained to identify unique orthographic word forms. Each word input to the model was represented as an unordered set of ordered letter pairs (open bigrams) that implement a flexible prelexical orthographic code. The network learned to map this prelexical orthographic code onto unique word representations (orthographic word forms). The network was trained on a realistic corpus of reading textbooks used in French primary schools. The amount of training was strictly identical to childrens exposure to reading material from grade 1 to grade 5. Network performance was examined at each grade level. Adjustment of the learning and vigilance parameters of the network allowed us to reproduce the developmental growth of word identification performance seen in children. The network exhibited a word frequency effect and was found to be sensitive to the order of presentation of word inputs, particularly with low frequency words. These words were better learned with a randomized presentation order compared with the order of presentation in the school books. These results open up interesting perspectives for the application of ART networks in the study of the dynamics of learning to read.


Science in China Series F: Information Sciences | 2014

Optimizing widths with PSO for center selection of Gaussian radial basis function networks

Zhong-Qiu Zhao; Xindong Wu; CanYi Lu; Hervé Glotin; Jun Gao

The radial basis function (RBF) centers play different roles in determining the classification capability of a Gaussian radial basis function neural network (GRBFNN) and should hold different width values. However, it is very hard and time-consuming to optimize the centers and widths at the same time. In this paper, we introduce a new insight into this problem. We explore the impact of the definition of widths on the selection of the centers, propose an optimization algorithm of the RBF widths in order to select proper centers from the center candidate pool, and improve the classification performance of the GRBFNN. The design of the objective function of the optimization algorithm is based on the local mapping capability of each Gaussian RBF. Further, in the design of the objective function, we also handle the imbalanced problem which may occur even when different local regions have the same number of examples. Finally, the recursive orthogonal least square (ROLS) and genetic algorithm (GA), which are usually adopted to optimize the RBF centers, are separately used to select the centers from the center candidates with the initialized widths, in order to testify the validity of our proposed width initialization strategy on the selection of centers. Our experimental results show that, compared with the heuristic width setting method, the width optimization strategy makes the selected centers more appropriate, and improves the classification performance of the GRBFNN. Moreover, the GRBFNN constructed by our method can attain better classification performance than the RBF LS-SVM, which is a state-of-the-art classifier.


European Journal of Cognitive Psychology | 2010

A developmental perspective on visual word recognition: New evidence and a self-organising model

Stéphane Dufau; Bernard Lété; Claude Touzet; Hervé Glotin; Johannes C. Ziegler; Jonathan Grainger

This study investigated the developmental trajectory of two marker effects of visual word recognition, word frequency, and orthographic neighbourhood effects, in French primary school children from Grades 1 to 5. Frequency and neighbourhood size were estimated using a realistic developmental database, which also allowed us to control for the effects of age-of-acquisition. A lexical decision task was used because the focus of this study was orthographic development. The results showed that frequency had clear effects that diminished with development, whereas orthographic neighbourhood had no significant influence at either grade. A self-organising neural network was trained on the realistic developmental corpus. The model successfully simulated the overall pattern found with children, including the absence of neighbourhood size effects. The self-organising neural network outperformed the classic interactive activation model in which frequency effects are simulated in a static way. These results highlight the potentially important role of unsupervised learning for the development of orthographic word forms.

Collaboration


Dive into the Hervé Glotin's collaboration.

Top Co-Authors

Avatar

Sébastien Paris

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Zhong-Qiu Zhao

Hefei University of Technology

View shared research outputs
Top Co-Authors

Avatar

Pascale Giraudet

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sabrina Tollari

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Xanadu Halkias

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Frédéric Berthommier

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Henning Müller

University of Applied Sciences Western Switzerland

View shared research outputs
Researchain Logo
Decentralizing Knowledge