Marco Liuni
IRCAM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marco Liuni.
IEEE Transactions on Audio, Speech, and Language Processing | 2013
Marco Liuni; Axel Röbel; Ewa Matusiak; Marco Romito; Xavier Rodet
We present an algorithm for sound analysis and re-synthesis with local automatic adaptation of time-frequency resolution. The reconstruction formula we propose is highly efficient, and gives a good approximation of the original signal from analyses with different time-varying resolutions within complementary frequency bands: this is a typical case where perfect reconstruction cannot in general be achieved with fast algorithms, which provides an error to be minimized. We provide a theoretical upper bound for the reconstruction error of our method, and an example of automatic adaptive analysis and re-synthesis of a music sound.
international conference on acoustics, speech, and signal processing | 2015
Axel Roebel; Jordi Pons; Marco Liuni; Mathieu Lagrangey
This paper presents an investigation into the detection and classification of drum sounds in polyphonic music and drum loops using non-negative matrix deconvolution (NMD) and the Itakura Saito divergence. The Itakura Saito divergence has recently been proposed as especially appropriate for decomposing audio spectra due to the fact that it is scale invariant, but it has not yet been widely adopted. The article studies new contributions for audio event detection methods using the Itakura Saito divergence that improve efficiency and numerical stability, and simplify the generation of target pattern sets. A new approach for handling background sounds is proposed and moreover, a new detection criteria based on estimating the perceptual presence of the target class sources is introduced. Experimental results obtained for drum detection in polyphonic music and drum soli demonstrate the beneficial effects of the proposed extensions.
Behavior Research Methods | 2018
Laura Rachman; Marco Liuni; Pablo Arias; Andreas Lind; Petter Johansson; Lars Hall; Daniel C. Richardson; Katsumi Watanabe; Stéphanie Dubal; Jean-Julien Aucouturier
We present an open-source software platform that transforms emotional cues expressed by speech signals using audio effects like pitch shifting, inflection, vibrato, and filtering. The emotional transformations can be applied to any audio file, but can also run in real time, using live input from a microphone, with less than 20-ms latency. We anticipate that this tool will be useful for the study of emotions in psychology and neuroscience, because it enables a high level of control over the acoustical and emotional content of experimental stimuli in a variety of laboratory situations, including real-time social situations. We present here results of a series of validation experiments aiming to position the tool against several methodological requirements: that transformed emotions be recognized at above-chance levels, valid in several languages (French, English, Swedish, and Japanese) and with a naturalness comparable to natural speech.
spoken language technology workshop | 2012
Nicolas Obin; Marco Liuni
This paper introduces an entropy-based spectral representation as a measure of the degree of noisiness in audio signals, complementary to the standard MFCCs for audio and speech recognition. The proposed representation is based on the Rényi entropy, which is a generalization of the Shannon entropy. In audio signal representation, Rényi entropy presents the advantage of focusing either on the harmonic content (prominent amplitude within a distribution) or on the noise content (equal distribution of amplitudes). The proposed representation outperforms all other noisiness measures - including Shannon and Wiener entropies - in a large-scale classification of vocal effort (whispered-soft/normal/loud-shouted) in the real scenario of multi-language massive role-playing video games. The improvement is around 10% in relative error reduction, and is particularly significant for the recognition of noisy speech - i.e., whispery/breathy speech. This confirms the role of noisiness for speech recognition, and will further be extended to the classification of voice quality for the design of an automatic voice casting system in video games.
international conference on acoustics, speech, and signal processing | 2014
Yuki Mitsufuji; Marco Liuni; Alex Baker; Axel Roebel
The following article describes research on source detection in multi channel (3DTV) audio streams. The problem is extremely complex due to the fact that multiple layers can be present in scenes (background music, ambience, commentator). In this work a new algorithm is developed that exploits the information from the different audio channels to detect, and possibly localize and separate independent audio sources. An algorithm based on online Non-negative Tensor Deconvolution is realized, to deal with sound sources with time dependent positions in the channel matrix. The evaluation is made on 3DTV 5.1 film soundtracks and on synthetic mixes of 3DTV 5.1 audio with target sounds from a sound effects database: a significant improvement of the detection performance is shown, compared with other decomposition techniques.
bioRxiv | 2018
Juan José Burred; Emmanuel Ponsot; Louise Goupil; Marco Liuni; Jean-Julien Aucouturier
Over the past few years, the field of visual social cognition and face processing has been dramatically impacted by a series of data-driven studies employing computer-graphics tools to synthesize arbitrary meaningful facial expressions. In the auditory modality, reverse correlation is traditionally used to characterize sensory processing at the level of spectral or spectro-temporal stimulus properties, but not higher-level cognitive processing of e.g. words, sentences or music, by lack of tools able to manipulate the stimulus dimensions that are relevant for these processes. Here, we present an open-source audio-transformation toolbox, called CLEESE, able to systematically randomize the prosody/melody of existing speech and music recordings. CLEESE works by cutting recordings in small successive time segments (e.g. every successive 100 milliseconds in a spoken utterance), and applying a random parametric transformation of each segment’s pitch, duration or amplitude, using a new Python-language implementation of the phase-vocoder digital audio technique. We present here two applications of the tool to generate stimuli for studying intonation processing of interrogative vs declarative speech, and rhythm processing of sung melodies.
IEEE Transactions on Audio, Speech, and Language Processing | 2018
Elie Laurent Benaroya; Nicolas Obin; Marco Liuni; Axel Roebel; Wilson Raumel; Sylvain Argentieri
This paper presents non-negative factorization of audio signals for the binaural localization of multiple sound sources within realistic and unknown sound environments. Non-negative tensor factorization (NTF) provides a sparse representation of multichannel audio signals in time, frequency, and space that can be exploited in computational audio scene analysis and robot audition for the separation and localization of sound sources. In the proposed formulation, each sound source is represented by means of spectral dictionaries, temporal activation, and its distribution within each channel (here, left and right ears). This distribution, being dependent on the frequency, can be interpreted as an explicit estimation of the Head-Related Transfer Function (HRTF) of a binaural head which can then be converted into the estimated sound source position. Moreover, the semisupervised formulation of the non-negative factorization allows us to integrate prior knowledge about some sound sources of interest whose dictionaries can be learned in advance, whereas the remaining sources are considered as background sound, which remains unknown and is estimated on the fly. The proposed NTF-based sound source localization is applied here to binaural sound source localization of multiple speakers within realistic sound environments.
bioRxiv | 2017
Laura Rachman; Marco Liuni; Pablo Arias; Andreas Lind; Petter Johansson; Lars Hall; Daniel C. Richardson; Katsumi Watanabe; Stéphanie Dubal; Jean-Julien Aucouturier
We present an open-source software platform that transforms the emotions expressed by speech signals using audio effects like pitch shifting, inflection, vibrato, and filtering. The emotional transformations can be applied to any audio file, but can also run in real-time (with less than 20-millisecond latency), using live input from a microphone. We anticipate that this tool will be useful for the study of emotions in psychology and neuroscience, because it enables a high level of control over the acoustical and emotional content of experimental stimuli in a variety of laboratory situations, including real-time social situations. We present here results of a series of validation experiments showing that transformed emotions are recognized at above-chance levels in the French, English, Swedish and Japanese languages, with a naturalness comparable to natural speech. Then, we provide a list of twenty-five experimental ideas applying this new tool to important topics in the behavioral sciences.
Musica/Tecnologia | 2013
Marco Liuni; Axel Röbel
arXiv: Sound | 2011
Marco Liuni; Peter Balazs; Axel Röbel