Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stéphane Dupont is active.

Publication


Featured researches published by Stéphane Dupont.


international conference on acoustics, speech, and signal processing | 1997

Subband-based speech recognition

Stéphane Dupont

In the framework of hidden Markov models (HMM) or hybrid HMM/artificial neural network (ANN) systems, we present a new approach towards automatic speech recognition (ASR). The general idea is to divide up the full frequency band (represented in terms of critical bands) into several subbands, compute phone probabilities for each subband on the basis of subband acoustic features, perform dynamic programming independently for each band, and merge the subband recognizers (recombining the respective, possibly weighted, scores) at some segmental level corresponding to temporal anchor points. The results presented in this paper confirm some preliminary tests reported earlier. On both isolated word and continuous speech tasks, it is indeed shown that even using quite simple recombination strategies, this subband ASR approach can yield at least comparable performance on clean speech while providing better robustness in the case of narrowband noise.


content based multimedia indexing | 2015

DeepSketch: Deep convolutional neural networks for sketch recognition and similarity search

Omar Seddati; Stéphane Dupont; Saïd Mahmoudi

In this paper, we present a system for sketch classification and similarity search. We used deep convolution neural networks (ConvNets), state of the art in the field of image recognition. They enable both classification and medium/highlevel features extraction. We make use of ConvNets features as a basis for similarity search using k-Nearest Neighbors (kNN). Evaluation are performed on the TU-Berlin benchmark. Our main contributions are threefold: first, we use ConvNets in contrast to most previous approaches based essentially on hand crafted features. Secondly, we propose a ConvNet that is both more accurate and lighter/faster than the two only previous attempts at making use of ConvNets for handsketch recognition. We reached an accuracy of 75.42%. Third, we shown that similarly to their application on natural images, ConvNets allow the extraction of medium-level and high-level features (depending on the depth) which can be used for similarity search.1


affective computing and intelligent interaction | 2015

Multimodal data collection of human-robot humorous interactions in the Joker project

Laurence Devillers; Sophie Rosset; Guillaume Dubuisson Duplessis; Mohamed A. Sehili; Lucile Bechade; Agnes Delaborde; Clément Gossart; Vincent Letard; Fan Yang; Yücel Yemez; Bekir Berker Turker; T. Metin Sezgin; Kevin El Haddad; Stéphane Dupont; Daniel Luzzati; Yannick Estève; Emer Gilmartin; Nick Campbell

Thanks to a remarkably great ability to show amusement and engagement, laughter is one of the most important social markers in human interactions. Laughing together can actually help to set up a positive atmosphere and favors the creation of new relationships. This paper presents a data collection of social interaction dialogs involving humor between a human participant and a robot. In this work, interaction scenarios have been designed in order to study social markers such as laughter. They have been implemented within two automatic systems developed in the Joker project: a social dialog system using paralinguistic cues and a task-based dialog system using linguistic content. One of the major contributions of this work is to provide a context to study human laughter produced during a human-robot interaction. The collected data will be used to build a generic intelligent user interface which provides a multimodal dialog system with social communication skills including humor and other informal socially oriented behaviors. This system will emphasize the fusion of verbal and non-verbal channels for emotional and social behavior perception, interaction and generation capabilities.


conference on multimedia modeling | 2015

IMOTION — A Content-Based Video Retrieval Engine

Luca Rossetto; Ivan Giangreco; Heiko Schuldt; Stéphane Dupont; Omar Seddati; T. Metin Sezgin; Yusuf Sahillioglu

This paper introduces the IMOTION system, a sketch-based video retrieval engine supporting multiple query paradigms. For vector space retrieval, the IMOTION system exploits a large variety of low-level image and video features, as well as high-level spatial and temporal features that can all be jointly used in any combination. In addition, it supports dedicated motion features to allow for the specification of motion within a video sequence. For query specification, the IMOTION system supports query-by-sketch interactions (users provide sketches of video frames), motion queries (users specify motion across frames via partial flow fields), query-by-example (based on images) and any combination of these, and provides support for relevance feedback.


international conference on acoustics, speech, and signal processing | 2015

Speech-laughs: An HMM-based approach for amused speech synthesis

Kevin El Haddad; Stéphane Dupont; Jérôme Urbain; Thierry Dutoit

This paper presents an HMM-based synthesis approach for speechlaughs. The building stone of this project was the idea of the co-occurrence of smile and laughter bursts in varying proportions within amused speech utterances. A corpus with three complementary speaking styles was used to train the underlying HMM models: neutral speech, speech-smile, and finally laughter in different articulatory configurations. Two types of speech-laughs were then synthesized: one made by combining neutral speech and laughter bursts, and the other made by combining speech-smile and laughter bursts. Synthesized stimuli were then rated in terms of perceived amusement and naturalness levels. Results show the compound effect of both laughter bursts and smile on both amusement and naturalness and inspire interesting perspectives.


conference on multimedia modeling | 2017

Enhanced Retrieval and Browsing in the IMOTION System

Luca Rossetto; Ivan Giangreco; Claudiu Tănase; Heiko Schuldt; Stéphane Dupont; Omar Seddati

This paper presents the IMOTION system in its third version. While still focusing on sketch-based retrieval, we improved upon the semantic retrieval capabilities introduced in the previous version by adding more detectors and improving the interface for semantic query specification. In addition to previous year’s system, we increase the role of features obtained from Deep Neural Networks in three areas: semantic class labels for more entry-level concepts, hidden layer activation vectors for query-by-example and 2D semantic similarity results display. The new graph-based result navigation interface further enriches the system’s browsing capabilities. The updated database storage system \(\textsf {ADAM}_{{pro }}\) designed from the ground up for large scale multimedia applications ensures the scalability to steadily growing collections.


IEEE Transactions on Audio, Speech, and Language Processing | 2017

Blind Speech Separation and Enhancement With GCC-NMF

Sean U. N. Wood; Jean Rouat; Stéphane Dupont; Gueorgui Pironkov

We present a blind source separation algorithm named GCC-NMF that combines unsupervised dictionary learning via non-negative matrix factorization (NMF) with spatial localization via the generalized cross correlation (GCC) method. Dictionary learning is performed on the mixture signal, with separation subsequently achieved by grouping dictionary atoms, at each point in time, according to their spatial origins. The resulting source separation algorithm is simple yet flexible, requiring no prior knowledge or information. Separation quality is evaluated for three tasks using stereo recordings from the publicly available SiSEC signal separation evaluation campaign: 3 and 4 concurrent speakers in reverberant environments, speech mixed with real-world background noise, and noisy recordings of a moving speaker. Performance is quantified using perceptually motivated and SNR-based measures with the PEASS and BSS Eval toolkits, respectively. We evaluate the effects of model parameters on separation quality, and compare our approach with other unsupervised and semi-supervised speech separation and enhancement approaches. We show that GCC-NMF is a flexible source separation algorithm, outperforming task-specific approaches in each of the three settings, including both blind as well as several informed approaches that require prior knowledge or information.


ieee international conference on automatic face gesture recognition | 2015

An HMM-based speech-smile synthesis system: An approach for amusement synthesis

Kevin El Haddad; Stéphane Dupont; Nicolas D'Alessandro; Thierry Dutoit

This paper presents an HMM-based speech-smile synthesis system. In order to do that, databases of three speech styles were recorded. This system was used to study to what extent synthesized speech-smiles (defined as Duchenne smiles in our work) and spread-lips (speech modulated by spreading the lips) communicate amusement. Our evaluation results showed that the speech-smiles synthesized sentences are perceived as more amused than the spread-lips ones. Acoustic analysis of the pitch and first two formants are also provided.


international conference on multimedia retrieval | 2017

Quadruplet Networks for Sketch-Based Image Retrieval

Omar Seddati; Stéphane Dupont; Saïd Mahmoudi

Freehand sketches are a simple and powerful tool for communication. They are easily recognized across cultures and suitable for various applications. In this paper, we use deep convolutional neural networks (ConvNets) to address sketch-based image retrieval (SBIR). We first train our ConvNets on sketch and image object recognition in a large scale benchmark for SBIR (the sketchy database). We then conduct a comprehensive study of ConvNets features for SBIR, using a kNN similarity search paradigm in the ConvNet feature space. In contrast to recent SBIR works, we propose a new architecture the quadruplet networks which enhance ConvNet features for SBIR. This new architecture enables ConvNets to extract more robust global and local features. We evaluate our approach on three large scale datasets. Our quadruplet networks outperform previous state-of-the-art on two of them by a significant margin and gives competitive results on the third. Our system achieves a recall of 42.16% (at k=1) for the sketchy database (more than 5% improvement), a Kendal score of 43.28Τb on the TU-Berlin SBIR benchmark (close to 6Τb improvement) and a mean average precision (MAP) of 32.16% on Flickr15k (a category level SBIR benchmark).


international conference on multimedia and expo | 2013

Nonlinear dimensionality reduction approaches applied to music and textural sounds

Stéphane Dupont; Thierry Ravet; Cécile Picard-Limpens; Christian Frisson

Recently, various dimensionality reduction approaches have been proposed as alternatives to PCA or LDA. These improved approaches do not rely on a linearity assumption, and are hence capable of discovering more complex embeddings within different regions of the data sets. Despite their success on artificial datasets, it is not straightforward to predict which technique is the most appropriate for a given real dataset. In this paper, we empirically evaluate recent techniques on two real audio use cases: musical instrument loops used in music production and sound effects used in sound editing. ISOMAP and t-SNE are being compared to PCA in a visualization problem, where we end up with a two-dimensional view. Various evaluation measures are used: classification performance, as well as trustworthiness/continuity assessing the preservation of neighborhoods. Although PCA and ISOMAP can yield good continuity performance even locally (samples in the original space remain close-by in the low-dimensional one), they fail to preserve the structure of the data well enough to ensure that distinct subgroups remain separate in the visualization. We show that t-SNE presents the best performance, and can even be beneficial as a pre-processing stage for improving classification when the amount of labeled data is low.

Collaboration


Dive into the Stéphane Dupont's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Filareti Tsalakanidou

Aristotle University of Thessaloniki

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge