Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yazid Attabi is active.

Publication


Featured researches published by Yazid Attabi.


IEEE Transactions on Affective Computing | 2013

Anchor Models for Emotion Recognition from Speech

Yazid Attabi; Pierre Dumouchel

In this paper, we study the effectiveness of anchor models applied to the multiclass problem of emotion recognition from speech. In the anchor models system, an emotion class is characterized by its measure of similarity relative to other emotion classes. Generative models such as Gaussian Mixture Models (GMMs) are often used as front-end systems to generate feature vectors used to train complex back-end systems such as support vector machines (SVMs) or a multilayer perceptron (MLP) to improve the classification performance. We show that in the context of highly unbalanced data classes, these back-end systems can improve the performance achieved by GMMs provided that an appropriate sampling or importance weighting technique is applied. Furthermore, we show that anchor models based on the euclidean or cosine distances present a better alternative to enhance performances because none of these techniques are needed to overcome the problem of skewed data. The experiments conducted on FAU AIBO Emotion Corpus, a database of spontaneous childrens speech, show that anchor models improve significantly the performance of GMMs by 6.2 percent relative. We also show that the introduction of within-class covariance normalization (WCCN) improves the performance of the anchor models for both distances, but to a higher extent for euclidean distance for which the results become competitive with cosine distance.


international conference on acoustics, speech, and signal processing | 2013

Multiple windowed spectral features for emotion recognition

Yazid Attabi; Jahangir Alam; Pierre Dumouchel; Patrick Kenny; Douglas D. O'Shaughnessy

MFCC (Mel Frequency Cepstral Coefficients) and PLP (Perceptual linear prediction coefficients) or RASTA-PLP have demonstrated good results whether when they are used in combination with prosodic features as suprasegmental (long-term) information or when used stand-alone as segmental (short-time) information. MFCC and PLP feature parameterization aims to represent the speech parameters in a way similar to how sound is perceived by humans. However, MFCC and PLP are usually computed from a Hamming-windowed periodogram spectrum estimate that is characterized by large variance. In this paper we study the effect of averaging spectral estimates obtained using a set of orthogonal tapers (windows) on emotion recognition performance. The multitaper MFCC and PLP are examined separately as short-time information vectors modeled using Gaussian mixture models (GMMs). When tested on the FAU AIBO spontaneous emotion corpus, a relative improvement ranging from 2.2% to 3.9% for both MFCC and PLP systems is achieved by multiple windowed spectral features compared to single windowed ones.


information sciences, signal processing and their applications | 2012

Emotion recognition from speech: WOC-NN and class-interaction

Yazid Attabi; Pierre Dumouchel

This study represents an extension work of the Weighted Ordered Classes-Nearest Neighbors (WOC-NN), a class-similarity based method introduced in our previous work [1]. WOC-NN computes similarities between a test instance and a class pattern of each emotion class in the likelihood space. An emotion class pattern is a representation of its ranked neighboring classes weighted according to their discrimination capability. In this study the class ranks weights are normalized inside each class pattern. We have also studied a new model of distance pattern based on a double class ranks introduced in order to take into account the interaction between the rank variables. The performance of the system based on double class ranks exceeds those based on a single class rank. Furthermore, using likelihood score rank of all class models in the decision rule of WOC-NN adds valuable information for data discrimination. The experiments on FAU AIBO corpus show that WOC-NN approach enhances the relative performance with 5.1% compared to Bayes decision rule. Also, the obtained result outperforms the state-of-the art ones.


IEEE Transactions on Affective Computing | 2017

Feature Learning from Spectrograms for Assessment of Personality Traits

Marc-André Carbonneau; Eric Granger; Yazid Attabi; Ghyslain Gagnon

Several methods have recently been proposed to analyze speech and automatically infer the personality of the speaker. These methods often rely on prosodic and other hand crafted speech processing features extracted with off-the-shelf toolboxes. To achieve high accuracy, numerous features are typically extracted using complex and highly parameterized algorithms. In this paper, a new method based on feature learning and spectrogram analysis is proposed to simplify the feature extraction process while maintaining a high level of accuracy. The proposed method learns a dictionary of discriminant features from patches extracted in the spectrogram representations of training speech segments. Each speech segment is then encoded using the dictionary, and the resulting feature set is used to perform classification of personality traits. Experiments indicate that the proposed method achieves state-of-the-art results with an important reduction in complexity when compared to the most recent reference methods. The number of features, and difficulties linked to the feature extraction process are greatly reduced as only one type of descriptors is used, for which the 7 parameters can be tuned automatically. In contrast, the simplest reference method uses 4 types of descriptors to which 6 functionals are applied, resulting in over 20 parameters to be tuned.


robot and human interactive communication | 2016

Integration framework for speech processing with live visualization interfaces

David Brodeur; François Grondin; Yazid Attabi; Pierre Dumouchel; François Michaud

Audition is a rich source of spatial, identity, linguistic and paralinguistic information. Processing all this information requires acquisition, processing and interpretation of sound sources, which are instantaneous, invisible and noisy signals. This can lead to different responses by the system in relation to the information perceived. This paper presents our first implementation of an integration framework for speech processing. Acquisition includes sound capture, sound source localization, tracking, separation and enhancement, and voice activity detection. Processing involves speech and emotion recognition. Interpretation consists of translating speech utterances into commands that can influence interaction through dialogue management and speech synthesis. The paper also describes two visualization interfaces, inspired by comic strips, to represent live vocal interactions in real life environments. These interfaces are used to demonstrate how the framework performs in live interactions and its use in a usability study.


international workshop on ambient assisted living | 2014

Automatic emotion recognition from cochlear implant-like spectrally reduced speech

Jahangir Alam; Yazid Attabi; Patrick Kenny; Pierre Dumouchel; Douglas O’Shaughnessy

In this paper we present a robust feature extractor that includes the In this paper we study the performance of emotion recognition from cochlear implant-like spectrally reduced speech (SRS) using the conventional Mel-frequency cepstral coefficients and a Gaussian mixture model (GMM)-based classifier. Cochlear-implant-like SRS of each utterance from the emotional speech corpus is obtained only from low-bandwidth subband temporal envelopes of the corresponding original utterance. The resulting utterances have less spectral information than the original utterances but contain the most relevant information for emotion recognition. The emotion classes are trained on the Mel-frequency cepstral coefficient (MFCC) features extracted from the SRS signals and classification is performed using MFCC features computed from the test SRS signals. In order to evaluate to the performance of the SRS-MFCC features, emotion recognition experiments are conducted on the FAU AIBO spontaneous emotion corpus. Conventional MFCC, Mel-warped DFT (discrete Fourier transform) spectrum-based cepstral coefficients (MWDCC), PLP (perceptual linear prediction), and amplitude modulation cepstral coefficient (AMCC) features extracted from the original signals are used for comparison purpose. Experimental results depict that the SRS-MFCC features outperformed all other features in terms of emotion recognition accuracy. Average relative improvements obtained over all baseline systems are 1.5% and 11.6% in terms of unweighted average recall and weighted average recall, respectively.


affective computing and intelligent interaction | 2011

Automatic emotion recognition from speech a PhD research proposal

Yazid Attabi; Pierre Dumouchel

This paper contains a PhD research proposal related to the domain of automatic emotion recognition from speech signal. We started by identifying our research problem, namely the acute confusion problem between emotion classes and we have cited different sources of this ambiguity. In the methodology section, we presented a method based on simililarity concept between a class and an instance patterns. We dubbed this method as Weighted Ordered classes - Nearest Neighbors. The first result obtained exceeds in performance the best result of the state-of-the art. Finally, as future work, we have made a proposition to improve the performance of the proposed system.


conference of the international speech communication association | 2009

Cepstral and long-term features for emotion recognition

Pierre Dumouchel; Najim Dehak; Yazid Attabi; Réda Dehak; Narjès Boufaden


conference of the international speech communication association | 2012

Anchor Models and WCCN Normalization For Speaker Trait Classification.

Yazid Attabi; Pierre Dumouchel


conference of the international speech communication association | 2013

Amplitude modulation features for emotion recognition from speech

Md. Jahangir Alam; Yazid Attabi; Pierre Dumouchel; Patrick Kenny; Douglas D. O'Shaughnessy

Collaboration


Dive into the Yazid Attabi's collaboration.

Top Co-Authors

Avatar

Pierre Dumouchel

École de technologie supérieure

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Douglas D. O'Shaughnessy

Institut national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David Brodeur

Université de Sherbrooke

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eric Granger

École de technologie supérieure

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ghyslain Gagnon

École de technologie supérieure

View shared research outputs
Researchain Logo
Decentralizing Knowledge