Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Erik Marchi is active.

Publication


Featured researches published by Erik Marchi.


acm multimedia | 2015

AV+EC 2015: The First Affect Recognition Challenge Bridging Across Audio, Video, and Physiological Data

Fabien Ringeval; Björn W. Schuller; Michel F. Valstar; Shashank Jaiswal; Erik Marchi; Denis Lalanne; Roddy Cowie; Maja Pantic

We present the first Audio-Visual+ Emotion recognition Challenge and workshop (AV+EC 2015) aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological emotion analysis. This is the 5th event in the AVEC series, but the very first Challenge that bridges across audio, video and physiological data. The goal of the Challenge is to provide a common benchmark test set for multimodal information processing and to bring together the audio, video and physiological emotion recognition communities, to compare the relative merits of the three approaches to emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. This paper presents the challenge, the dataset and the performance of the baseline system.


Molecular Autism | 2016

An investigation of the ‘female camouflage effect’ in autism using a computerized ADOS-2 and a test of sex/gender differences

Agnieszka Rynkiewicz; Bjoern W. Schuller; Erik Marchi; Stefano Piana; Antonio Camurri; Amandine Lassalle; Simon Baron-Cohen

BackgroundAutism spectrum conditions (autism) are diagnosed more frequently in boys than in girls. Females with autism may have been under-identified due to not only a male-biased understanding of autism but also females’ camouflaging. The study describes a new technique that allows automated coding of non-verbal mode of communication (gestures) and offers the possibility of objective, evaluation of gestures, independent of human judgment. The EyesWeb software platform and the Kinect sensor during two demonstration activities of ADOS-2 (Autism Diagnostic Observation Schedule, Second Edition) were used.MethodsThe study group consisted of 33 high-functioning Polish girls and boys with formal diagnosis of autism or Asperger syndrome aged 5–10, with fluent speech, IQ average and above and their parents (girls with autism, n = 16; boys with autism, n = 17). All children were assessed during two demonstration activities of Module 3 of ADOS-2, administered in Polish, and coded using Polish codes. Children were also assessed with Polish versions of the Eyes and Faces Tests. Parents provided information on the author-reviewed Polish research translation of SCQ (Social Communication Questionnaire, Current and Lifetime) and Polish version of AQ Child (Autism Spectrum Quotient, Child).ResultsGirls with autism tended to use gestures more vividly as compared to boys with autism during two demonstration activities of ADOS-2. Girls with autism made significantly more mistakes than boys with autism on the Faces Test. All children with autism had high scores in AQ Child, which confirmed the presence of autistic traits in this group. The current communication skills of boys with autism reported by parents in SCQ were significantly better than those of girls with autism. However, both girls with autism and boys with autism improved in the social and communication abilities over the lifetime. The number of stereotypic behaviours in boys significantly decreased over life whereas it remained at a comparable level in girls with autism.ConclusionsHigh-functioning females with autism might present better on non-verbal (gestures) mode of communication than boys with autism. It may camouflage other diagnostic features. It poses risk of under-diagnosis or not receiving the appropriate diagnosis for this population. Further research is required to examine this phenomenon so appropriate gender revisions to the diagnostic assessments might be implemented.


international conference on acoustics, speech, and signal processing | 2015

A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks

Erik Marchi; Fabio Vesperini; Florian Eyben; Stefano Squartini; Björn W. Schuller

Acoustic novelty detection aims at identifying abnormal/novel acoustic signals which differ from the reference/normal data that the system was trained with. In this paper we present a novel unsupervised approach based on a denoising autoencoder. In our approach auditory spectral features are processed by a denoising autoencoder with bidirectional Long Short-Term Memory recurrent neural networks. We use the reconstruction error between the input and the output of the autoencoder as activation signal to detect novel events. The autoencoder is trained on a public database which contains recordings of typical in-home situations such as talking, watching television, playing and eating. The evaluation was performed on more than 260 different abnormal events. We compare results with state-of-the-art methods and we conclude that our novel approach significantly outperforms existing methods by achieving up to 93.4% F-Measure.


international conference on acoustics, speech, and signal processing | 2014

Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks

Erik Marchi; Giacomo Ferroni; Florian Eyben; Leonardo Gabrielli; Stefano Squartini; Björn W. Schuller

A plethora of different onset detection methods have been proposed in the recent years. However, few attempts have been made with respect to widely-applicable approaches in order to achieve superior performances over different types of music and with considerable temporal precision. In this paper, we present a multi-resolution approach based on discrete wavelet transform and linear prediction filtering that improves time resolution and performance of onset detection in different musical scenarios. In our approach, wavelet coefficients and forward prediction errors are combined with auditory spectral features and then processed by a bidirectional Long Short-Term Memory recurrent neural network, which acts as reduction function. The network is trained with a large database of onset data covering various genres and onset types. We compare results with state-of-the-art methods on a dataset that includes Bello, Glover and ISMIR 2004 Ballroom sets, and we conclude that our approach significantly outperforms existing methods in terms of F-Measure. For pitched non percussive music an absolute improvement of 7.5% is reported.


privacy security risk and trust | 2012

Speech, Emotion, Age, Language, Task, and Typicality: Trying to Disentangle Performance and Feature Relevance

Erik Marchi; Anton Batliner; Björn W. Schuller; Shimrit Fridenzon; Shahar Tal; Ofer Golan

The availability of speech corpora is positively correlated with typicality: The more typical the population is we draw our sample from, the easier it is to get enough data. The less typical the envisaged population is, the more difficult it is to get enough data. Children with Autism Spectrum Condition are atypical in several respect: They are children, they might have problems with an experimental setting where their speech should be recorded, and they belong to a specific subgroup of children. Thus we address two possible strategies: First, we analyse the feature relevance for samples taken from different populations, this is not directly improving performances but we found additional specific features within specific groups. Second, we perform cross-corpus experiments to evaluate if enriching the training data with data obtained from similar populations can increase classification performances. In this pilot study we therefore use four different samples of speakers, all of them producing one and the same emotion and in addition, the neutral state. We used two publicly available databases, the Berlin Emotional Speech database and the FAU Aibo Corpus, in addition to our own ASC-Inclusion database.


conference of the international speech communication association | 2016

Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks

Zixing Zhang; Fabien Ringeval; Jing Han; Jun Deng; Erik Marchi; Björn W. Schuller

During the last decade, speech emotion recognition technology has matured well enough to be used in some real-life scenarios. However, these scenarios require an almost silent environment to not compromise the performance of the system. Emotion recognition technology from speech thus needs to evolve and face more challenging conditions, such as environmental additive and convolutional noises, in order to broaden its applicability to real-life conditions. This contribution evaluates the impact of a front-end feature enhancement method based on an autoencoder with long short-term memory neural networks, for robust emotion recognition from speech. Support Vector Regression is then used as a back-end for time- and value-continuous emotion prediction from enhanced features. We perform extensive evaluations on both non-stationary additive noise and convolutional noise, on a database of spontaneous and natural emotions. Results show that the proposed method significantly outperforms a system trained on raw features, for both arousal and valence dimensions, while having almost no degradation when applied to clean speech.


international symposium on neural networks | 2015

Non-linear prediction with LSTM recurrent neural networks for acoustic novelty detection

Erik Marchi; Fabio Vesperini; Felix Weninger; Florian Eyben; Stefano Squartini; Björn W. Schuller

Acoustic novelty detection aims at identifying abnormal/novel acoustic signals which differ from the reference/normal data that the system was trained with. In this paper we present a novel approach based on non-linear predictive denoising autoencoders. In our approach, auditory spectral features of the next short-term frame are predicted from the previous frames by means of Long-Short Term Memory (LSTM) recurrent denoising autoencoders. We show that this yields an effective generative model for audio. The reconstruction error between the input and the output of the autoencoder is used as activation signal to detect novel events. The autoencoder is trained on a public database which contains recordings of typical in-home situations such as talking, watching television, playing and eating. The evaluation was performed on more than 260 different abnormal events. We compare results with state-of-the-art methods and we conclude that our novel approach significantly outperforms existing methods by achieving up to 94.4% F-Measure.


international conference on acoustics, speech, and signal processing | 2016

Enhanced semi-supervised learning for multimodal emotion recognition

Zixing Zhang; Fabien Ringeval; Bin Dong; Eduardo Coutinho; Erik Marchi; Björn W. Schuller

Semi-Supervised Learning (SSL) techniques have found many applications where labeled data is scarce and/or expensive to obtain. However, SSL suffers from various inherent limitations that limit its performance in practical applications. A central problem is that the low performance that a classifier can deliver on challenging recognition tasks reduces the trustability of the automatically labeled data. Another related issue is the noise accumulation problem - instances that are misclassified by the system are still used to train it in future iterations. In this paper, we propose to address both issues in the context of emotion recognition. Initially, we exploit the complementarity between audio-visual features to improve the performance of the classifier during the supervised phase. Then, we iteratively re-evaluate the automatically labeled instances to correct possibly mislabeled data and this enhances the overall confidence of the systems predictions. Experimental results performed on the RECOLA database demonstrate that our methodology delivers a strong performance in the classification of high/low emotional arousal (UAR = 76.5%), and significantly outperforms traditional SSL methods by at least 5.0% (absolute gain).


international symposium on neural networks | 2014

Audio onset detection: A wavelet packet based approach with recurrent neural networks

Erik Marchi; Giacomo Ferroni; Florian Eyben; Stefano Squartini; Björn W. Schuller

This paper concerns the exploitation of multi-resolution time-frequency features via Wavelet Packet Transform to improve audio onset detection. In our approach, Wavelet Packet Energy Coefficients (WPEC) and Auditory Spectral Features (ASF) are processed by Bidirectional Long Short-Term Memory (BLSTM) recurrent neural network that yields the onsets location. The combination of the two feature sets, together with the BLSTM based detector, form an advanced energy-based approach that takes advantage from the multi-resolution analysis given by the wavelet decomposition of the audio input signal. The neural network is trained with a large database of onset data covering various genres and onset types. Due to its data-driven nature, our approach does not require the onset detection method and its parameters to be tuned to a particular type of music. We show a comparison with other types and sizes of recurrent neural networks and we compare results with state-of-the-art methods on the whole onset dataset. We conclude that our approach significantly increase performance in terms of F-measure without any music genres or onset type constraints.


conference of the international speech communication association | 2016

Automatic Analysis of Typical and Atypical Encoding of Spontaneous Emotion in the Voice of Children.

Fabien Ringeval; Erik Marchi; Charline Grossard; Jean Xavier; Mohamed Chetouani; David Cohen; Björn W. Schuller

Children with Autism Spectrum Disorders (ASD) present significant difficulties to understand and express emotions. Systems have thus been proposed to provide objective measurements of acoustic features used by children suffering from ASD to encode emotion in speech. However, only a few studies have exploited such systems to compare different groups of children in their ability to express emotions, and even less have focused on the analysis of spontaneous emotion. In this contribution, we provide insights by extensive evaluations carried out on a new database of spontaneous speech inducing three emotion categories of valence (positive, neutral, and negative). We evaluate the potential of using an automatic recognition system to differentiate groups of children, i.e., pervasive developmental disorders, pervasive developmental disorders not-otherwise specified, specific language impairments, and typically developing, in their abilities to express spontaneous emotion in a common unconstrained task. Results show that all groups of children can be differentiated directly (diagnosis recognition) and indirectly (emotion recognition) by the proposed system.

Collaboration


Dive into the Erik Marchi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anton Batliner

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Stefano Squartini

Marche Polytechnic University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sven Bölte

Stockholm County Council

View shared research outputs
Top Co-Authors

Avatar

Delia Pigat

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar

Ian Davies

University of Cambridge

View shared research outputs
Researchain Logo
Decentralizing Knowledge