Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Carlo Drioli is active.

Publication


Featured researches published by Carlo Drioli.


Journal of the Acoustical Society of America | 2005

A flow waveform-matched low-dimensional glottal model based on physical knowledge

Carlo Drioli

The purpose of this study is to explore the possibility for physically based mathematical models of the voice source to accurately reproduce inverse filtered glottal volume-velocity waveforms. A low-dimensional, self-oscillating model of the glottal source with waveform-matching properties is proposed. The model relies on a lumped mechano-aerodynamic scheme loosely inspired by the one- and multimass lumped models. The vocal folds are represented by a single mechanical resonator and a propagation line which takes into account the vertical phase differences. The vocal-fold displacement is coupled to the glottal flow by means of an aerodynamic driving block which includes a general parametric nonlinear component. The principal characteristics of the flow-induced oscillations are retained, and the overall model is able to match inverse-filtered glottal flow signals. The method offers in principle the possibility of performing transformations of the glottal flow by acting on the physiologically based parameters of the model. This is a desirable property, e.g., for speech synthesis applications. The model was tested on a data set which included inverse-filtered glottal flow waveforms of different characteristics. The results demonstrate the possibility of reproducing natural speech waveforms with high accuracy, and of controlling important characteristics of the synthesis such as pitch.


Speech Communication | 2004

Modifications of phonetic labial targets in emotive speech: effects of the co-production of speech and emotions

Emanuela Magno Caldognetto; Piero Cosi; Carlo Drioli; Graziano Tisato; Federica Cavicchio

This paper describes how the visual and acoustic characteristics of some Italian phones (/’a/, /b/, /v/) are modifled in emotive speech by the expression of joy, surprise, sadness, disgust, anger, and fear. In this research we speciflcally analyze the interaction between labial conflgurations, peculiar to each emotion, and the articulatory lip movements of the Italian vowel /’a/ and consonants /b/ and /v/, deflned by phonetic-phonological rules. This interaction was quantifled examining the variations of the following parameters: lip opening, upper and lower lip vertical displacements, lip rounding, anterior/posterior movements (protrusion) of upper lip and lower lip, left and right lip corner horizontal displacements, left and right corner vertical displacements, and asymmetry parameters calculated as the difierence between right and left corner position along the horizontal and the vertical axes. Moreover, we present the correlations between articulatory data and the spectral features of the co-produced acoustic signal.


Pattern Recognition | 2011

Generative modeling and classification of dialogs by a low-level turn-taking feature

Marco Cristani; Anna Pesarin; Carlo Drioli; Alessandro Tavano; Alessandro Perina; Vittorio Murino

In the last few years, a growing attention has been paid to the problem of human-human communication, trying to devise artificial systems able to mediate a conversational setting between two or more people. In this paper, we propose an automatic system based on a generative structure able to classify dialog scenarios. The generative model is composed by integrating a Gaussian mixture model and a (observed) Markovian influence model, and it is fed with a novel low-level acoustic feature termed steady conversational period (SCP). SCPs are built on duration of continuous slots of silence or speech, taking also into account conversational turn-taking. The interactional dynamics built upon the transitions among SCPs provides a behavioral blueprint of conversational settings without relying on segmental or continuous phonetic features, and may be important for predicting the evolution of typical conversational situations in different dialog scenarios. The model has been tested on an extensive set of real, dyadic and multi-person conversational settings, including a recent dyadic dataset and the AMI meeting corpus. Comparative tests are made using conventional acoustic features and classification methods, showing that the proposed scheme provides superior classification performances for all conversational settings in our datasets. Moreover, we prove that our approach is able to characterize the nature of multi-person conversation (namely, the role of the participants) in a very accurate way, thus demonstrating great versatility.


acm multimedia | 2010

Toward an automatically generated soundtrack from low-level cross-modal correlations for automotive scenarios

Marco Cristani; Anna Pesarin; Carlo Drioli; Vittorio Murino; Antonio Rodà; Michele Grapulin; Nicu Sebe

In this paper, we propose a novel recommendation policy for driving scenarios. While driving a car, listening to an audio track may enrich the atmosphere, conveying emotions that let the driver sense a more arousing experience. Here, we are introducing a recommendation policy that, given a video sequence taken by a camera mounted onboard a car, chooses the most suitable audio piece from a predetermined set of melodies. The mixing mechanism takes inspiration from a set of generic qualitative aesthetical rules for cross-modal linking, realized by associating audio and video features. The contribution of this paper is to translate such qualitative rules into quantitative terms, learning from an extensive training dataset cross-modal statistical correlations, and validating them in a thoroughly way. In this way, we are able to define what are the audio and video features that correlate at best (i.e., promoting or rejecting some aesthetical rules), and what are their correlation intensities. This knowledge is then employed for the realization of the recommendation policy. A set of user studies illustrate and validate the policy, thus encouraging further developments toward a real implementation in an automotive application.


EURASIP Journal on Advances in Signal Processing | 2001

Radial basis function networks for conversion of sound spectra

Carlo Drioli

In many advanced signal processing tasks, such as pitch shifting, voice conversion or sound synthesis, accurate spectral processing is required. Here, the use of Radial Basis Function Networks (RBFN) is proposed for the modeling of the spectral changes (or conversions) related to the control of important sound parameters, such as pitch or intensity. The identification of such conversion functions is based on a procedure which learns the shape of the conversion from few couples of target spectra from a data set. The generalization properties of RBFNs provides for interpolation with respect to the pitch range. In the construction of the training set, mel-cepstral encoding of the spectrum is used to catch the perceptually most relevant spectral changes. Moreover, a singular value decomposition (SVD) approach is used to reduce the dimension of conversion functions. The RBFN conversion functions introduced are characterized by a perceptually-based fast training procedure, desirable interpolation properties and computational efficiency.


Signal Processing | 2003

Orthogonal least squares algorithm for the approximation of a map and its derivatives with a RBF network

Carlo Drioli; Davide Rocchesso

Radial basis function networks (RBFNs) are used primarily to solve curve-fitting problems and for non-linear system modeling. Several algorithms are known for the approximation of a non-linear curve from a sparse data set by means of RBFNs. Regularization techniques allow to define constraints on the smoothness of the curve by using the gradient of the function in the training. However, procedures that permit to arbitrarily set the value of the derivatives for the data are rarely found in the literature. In this paper, the orthogonal least squares (OLS) algorithm for the identification of RBFNs is modified to provide the approximation of a non-linear single-input single-output map along with its derivatives, given a set of training data. The interest in the derivatives of non-linear functions concerns many identification and control tasks where the study of system stability and robustness is addressed. The effectiveness of the proposed algorithm is demonstrated with examples in the field of data interpolation and control of non-linear dynamical systems.


International Conference on Information Technologies for Performing Arts, Media Access, and Entertainment | 2013

Networked Performances and Natural Interaction via LOLA: Low Latency High Quality A/V Streaming System

Carlo Drioli; Claudio Allocchio; Nicola Buso

We present LOLA (LOw LAtency audio visual streaming system), a system for distributed performing arts interaction over advanced packet networks. It is intended to operate on high performance networking infrastructures, and is based on low latency audio/video acquisition hardware and on the integration and optimization of audio/video data acquisition, presentation and transmission. The extremely low round trip delay of the transmitted data makes the system suitable for remote musical education, real time distributed musical performance and performing arts activities, but in general also for any human-human interactive distributed activity in which timing and responsiveness are critical factors for the quality of the interaction. The experimentation conducted so far with professional music performers and skilled music students, on geographical distances up to 3500 Km, demonstrated its effectiveness and suitability for distance musical interaction, even when professional players are involved and very ”tempo sensitive” classical baroque music repertoire is concerned.


computer vision and pattern recognition | 2009

Auditory dialog analysis and understanding by generative modelling of interactional dynamics

Marco Cristani; Anna Pesarin; Carlo Drioli; Alessandro Tavano; Alessandro Perina; Vittorio Murino

In the last few years, the interest in the analysis of human behavioral schemes has dramatically grown, in particular for the interpretation of the communication modalities called social signals. They represent well defined interaction patterns, possibly unconscious, characterizing different conversational situations and behaviors in general. In this paper, we illustrate an automatic system based on a generative structure able to analyze conversational scenarios. The generative model is composed by integrating a Gaussian mixture model and the (observed) influence model, and it is fed with a novel kind of simple low-level auditory social signals, which are termed steady conversational periods (SCPs). These are built on duration of continuous slots of silence or speech, taking also into account conversational turn-taking. The interactional dynamics built upon the transitions among SCPs provide a behavioral blueprint of conversational settings without relying on segmental or continuous phonetic features. Our contribution here is to show the effectiveness of our model when applied on dialogs classification and clustering tasks, considering dialogs between adults and between children and adults, in both flat and arguing discussions, and showing excellent performances also in comparison with state-of-the-art frameworks.


international conference on pattern recognition | 2008

A statistical signature for automatic dialogue classification

Anna Pesarin; Marco Cristani; Vittorio Murino; Carlo Drioli; Alessandro Perina; Alessandro Tavano

In the last few years, there has been a certain attention to the problem of human-human communication, trying to devise artificial systems able to mediate a conversational setting between two or more people. In this paper, we designed an automatic system based on a generative structure able to classify hard dialog acts. The generative model is composed by integrating a hierarchical Gaussian mixture model and the Influence Model, originating a brand new method able to deal with such difficult scenarios. The method has been tested on a set of conversational settings involving dialogues between adults and children and adults, in flat and arguing discussions, proving very accurate classification results.


IEEE Signal Processing Letters | 2014

Incoherent Frequency Fusion for Broadband Steered Response Power Algorithms in Noisy Environments

Daniele Salvati; Carlo Drioli; Gian Luca Foresti

The steered response power (SRP) algorithms have been shown to be among the most effective and robust ones in noisy environments for direction of arrival (DOA) estimation. In broadband signal applications, the SRP methods typically perform their computations in the frequency-domain by applying a fast Fourier transform (FFT) on a signal portion, calculating the response power on each frequency bin, and subsequently fusing these estimates to obtain the final result. We introduce a frequency response incoherent fusion method based on a normalized arithmetic mean (NAM). Experiments are presented that rely on the SRP algorithms for the localization of motor vehicles in a noisy outdoor environment, focusing our discussion on performance differences with respect to different signal-to-noise ratios (SNR), and on spatial resolution issues for closely spaced sources. We demonstrate that the proposed fusion method provides higher resolution for the delay-and-sum SRP, and improved performances for minimum variance distortionless response (MVDR) and multiple signal classification (MUSIC).

Collaboration


Dive into the Carlo Drioli's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Piero Cosi

National Research Council

View shared research outputs
Top Co-Authors

Avatar

Graziano Tisato

National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Davide Rocchesso

Ca' Foscari University of Venice

View shared research outputs
Top Co-Authors

Avatar

Fabio Tesser

National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Vittorio Murino

Istituto Italiano di Tecnologia

View shared research outputs
Researchain Logo
Decentralizing Knowledge