Fabio Vesperini | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fabio Vesperini is active.

Explore More

Publication

Featured researches published by Fabio Vesperini.

international conference on acoustics, speech, and signal processing | 2015

A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks

Erik Marchi; Fabio Vesperini; Florian Eyben; Stefano Squartini; Björn W. Schuller

Acoustic novelty detection aims at identifying abnormal/novel acoustic signals which differ from the reference/normal data that the system was trained with. In this paper we present a novel unsupervised approach based on a denoising autoencoder. In our approach auditory spectral features are processed by a denoising autoencoder with bidirectional Long Short-Term Memory recurrent neural networks. We use the reconstruction error between the input and the output of the autoencoder as activation signal to detect novel events. The autoencoder is trained on a public database which contains recordings of typical in-home situations such as talking, watching television, playing and eating. The evaluation was performed on more than 260 different abnormal events. We compare results with state-of-the-art methods and we conclude that our novel approach significantly outperforms existing methods by achieving up to 93.4% F-Measure.

international symposium on neural networks | 2015

Non-linear prediction with LSTM recurrent neural networks for acoustic novelty detection

Erik Marchi; Fabio Vesperini; Felix Weninger; Florian Eyben; Stefano Squartini; Björn W. Schuller

Acoustic novelty detection aims at identifying abnormal/novel acoustic signals which differ from the reference/normal data that the system was trained with. In this paper we present a novel approach based on non-linear predictive denoising autoencoders. In our approach, auditory spectral features of the next short-term frame are predicted from the previous frames by means of Long-Short Term Memory (LSTM) recurrent denoising autoencoders. We show that this yields an effective generative model for audio. The reconstruction error between the input and the output of the autoencoder is used as activation signal to detect novel events. The autoencoder is trained on a public database which contains recordings of typical in-home situations such as talking, watching television, playing and eating. The evaluation was performed on more than 260 different abnormal events. We compare results with state-of-the-art methods and we conclude that our novel approach significantly outperforms existing methods by achieving up to 94.4% F-Measure.

Computational Intelligence and Neuroscience | 2017

Deep Recurrent Neural Network-Based Autoencoders for Acoustic Novelty Detection

Erik Marchi; Fabio Vesperini; Stefano Squartini; Björn W. Schuller

In the emerging field of acoustic novelty detection, most research efforts are devoted to probabilistic approaches such as mixture models or state-space models. Only recent studies introduced (pseudo-)generative models for acoustic novelty detection with recurrent neural networks in the form of an autoencoder. In these approaches, auditory spectral features of the next short term frame are predicted from the previous frames by means of Long-Short Term Memory recurrent denoising autoencoders. The reconstruction error between the input and the output of the autoencoder is used as activation signal to detect novel events. There is no evidence of studies focused on comparing previous efforts to automatically recognize novel events from audio signals and giving a broad and in depth evaluation of recurrent neural network-based autoencoders. The present contribution aims to consistently evaluate our recent novel approaches to fill this white spot in the literature and provide insight by extensive evaluations carried out on three databases: A3Novelty, PASCAL CHiME, and PROMETHEUS. Besides providing an extensive analysis of novel and state-of-the-art methods, the article shows how RNN-based autoencoders outperform statistical approaches up to an absolute improvement of 16.4% average F-measure over the three databases.

international joint conference on neural network | 2016

Deep neural networks for Multi-Room Voice Activity Detection: Advancements and comparative evaluation

Fabio Vesperini; Paolo Vecchiotti; Stefano Squartini; Francesco Piazza

This paper focuses on Voice Activity Detectors (VAD) for multi-room domestic scenarios based on deep neural network architectures. Interesting advancements are observed with respect to a previous work. A comparative and extensive analysis is lead among four different neural networks (NN). In particular, we exploit Deep Belief Network (DBN), Multi-Layer Perceptron (MLP), Bidirectional Long Short-Term Memory recurrent neural network (BLSTM) and Convolutional Neural Network (CNN). The latter has recently encountered a large success in the computational audio processing field and it has been successfully employed in our task. Two home recorded datasets are used in order to approximate real-life scenarios. They contain audio files from several microphones arranged in various rooms, from whom six features are extracted and used as input for the deep neural classifiers. The output stage has been redesigned compared to the previous authors contribution, in order to take advantage of the networks discriminative ability. Our study is composed by a multi-stage analysis focusing on the selection of the features, the network size and the input microphones. Results are evaluated in terms of Speech Activity Detection error rate (SAD). As result, a best SAD equal to 5.8% and 2.6% is reached respectively in the two considered datasets. In addiction, a significant solidity in terms of microphone positioning is observed in the case of CNN.

international workshop on machine learning for signal processing | 2016

A neural network based algorithm for speaker localization in a multi-room environment

Fabio Vesperini; Paolo Vecchiotti; Stefano Squartini; Francesco Piazza

A Speaker Localization algorithm based on Neural Networks for multi-room domestic scenarios is proposed in this paper. The approach is fully data-driven and employs a Neural Network fed by GCC-PHAT (Generalized Cross Correlation Phase Transform) Patterns, calculated by means of the microphone signals, to determine the speaker position in the room under analysis. In particular, we deal with a multi-room case study, in which the acoustic scene of each room is influenced by sounds emitted in the other rooms. The algorithm is tested against the home recorded DIRHA dataset, characterized by multiple wall and ceiling microphone signals for each room. In particular, we focused on the speaker localization problem in two distinct neighbouring rooms. We assumed the presence of an Oracle multi-room Voice Activity Detector (VAD) in our experiments. A three-stage optimization procedure has been adopted to find the best network configuration and GCC-PHAT Patterns combination. Moreover, an algorithm based on Time Difference of Arrival (TDOA), recently proposed in literature for the addressed applicative context, has been considered as term of comparison. As result, the proposed algorithm outperforms the reference one, providing an average localization error, expressed in terms of RMSE, equal to 525 mm against 1465 mm. Concluding, we also assessed the algorithm performance when a real VAD, recently proposed by some of the authors, is used. Even though a degradation of localization capability is registered (an average RMSE equal to 770 mm), still a remarkable improvement with respect to the state of the art performance is obtained.

Archive | 2018

Convolutional Neural Networks with 3-D Kernels for Voice Activity Detection in a Multiroom Environment

Paolo Vecchiotti; Fabio Vesperini; Stefano Squartini; Francesco Piazza

This paper focuses on employing Convolutional Neural Networks (CNN) with 3-D kernels for Voice Activity Detectors in multi-room domestic scenarios (mVAD). This technology is compared with the Multi Layer Perceptron (MLP) and interesting advancements are observed with respect to previous works of the authors. In order to approximate real-life scenarios, the DIRHA dataset is exploited. It has been recorded in a home environment by means of several microphones arranged in various rooms. Our study is composed by a multi-stage analysis focusing on the selection of the network size and the input microphones in relation with their number and position. Results are evaluated in terms of Speech Activity Detection error rate (SAD). The CNN-mVAD outperforms the other method with a significant solidity in terms of performance statistics, achieving in the best overall case a SAD equal to 7.0%.

international symposium on neural networks | 2017

Acoustic novelty detection with adversarial autoencoders

Fabio Vesperini; Stefano Squartini; Francesco Piazza

Novelty detection is the task of recognising events the differ from a model of normality. This paper proposes an acoustic novelty detector based on neural networks trained with an adversarial training strategy. The proposed approach is composed of a feature extraction stage that calculates Log-Mel spectral features from the input signal. Then, an autoencoder network, trained on a corpus of “normal” acoustic signals, is employed to detect whether a segment contains an abnormal event or not. A novelty is detected if the Euclidean distance between the input and the output of the autoencoder exceeds a certain threshold. The innovative contribution of the proposed approach resides in the training procedure of the autoencoder network: instead of using the conventional training procedure that minimises only the Minimum Mean Squared Error loss function, here we adopt an adversarial strategy, where a discriminator network is trained to distinguish between the output of the autoencoder and data sampled from the training corpus. The autoencoder, then, is trained also by using the binary cross-entropy loss calculated at the output of the discriminator network. The performance of the algorithm has been assessed on a corpus derived from the PASCAL CHiME dataset. The results showed that the proposed approach provides a relative performance improvement equal to 0.26% compared to the standard autoencoder. The significance of the improvement has been evaluated with a one-tailed z-test and resulted significant with p < 0.001. The presented approach thus showed promising results on this task and it could be extended as a general training strategy for autoencoders if confirmed by additional experiments.

european signal processing conference | 2017

A neural network approach for sound event detection in real life audio

Michele Valenti; Dario Tonelli; Fabio Vesperini; Stefano Squartini

This paper presents and compares two algorithms based on artificial neural networks (ANNs) for sound event detection in real life audio. Both systems have been developed and evaluated with the material provided for the third task of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge. For the first algorithm, we make use of an ANN trained on different features extracted from the down-mixed mono channel audio. Secondly, we analyse a binaural algorithm where the same feature extraction is performed on four different channels: the two binaural channels, the averaged monaural signal and the difference between the binaural channels. The proposed feature set comprehends, along with mel-frequency cepstral coefficients and log-mel energies, also activity information extracted with two different voice activity detection (VAD) algorithms. Moreover, we will present results obtained with two different neural architectures, namely multi-layer perceptrons (MLPs) and recurrent neural networks. The highest scores obtained on the DCASE 2016 evaluation dataset are achieved by a MLP trained on binaural features and adaptive energy VAD; they consist of an averaged error rate of 0.79 and an averaged F1 score of 48.1%, thus marking an improvement over the best score registered in the DCASE 2016 challenge.

international joint conference on neural network | 2016

Combining evolution strategies and neural network procedures for compression driver design.

Michele Gasparini; Fabio Vesperini; Stefania Cecchi; Stefano Squartini; Francesco Piazza; Romolo Toppi

Compression driver design involves the study of complex mathematical models characterized by a great number of variables, implying high computational cost and long design time. Therefore, an optimization procedure is required to enhance the design procedure, especially from the parameters point of view. In this paper, a combined approach based both on evolution strategy procedure and neural network model is presented. Taking into consideration several tests on a real compression driver, the proposed method is capable to enhance the design procedure from the point of view of obtained frequency response and of the computational performance.

Computer Speech & Language | 2018