Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Andrea Paoloni is active.

Publication


Featured researches published by Andrea Paoloni.


international conference on acoustics, speech, and signal processing | 1991

A recurrent time-delay neural network for improved phoneme recognition

F. Greco; Andrea Paoloni; G. Ravaioli

The authors propose a modification to the structure of the time-delay neural network (TDNN), obtained through feedback at the first-hidden layer level. The experiment carried out with the new model, called RTDNN (recurrent TDNN), consists of the classification of the unvoiced plosive phonemes. These were extracted from an initial and intermediate position in a list of the most common Italian words, uttered by a male speaker, thus obtaining 250 tokens per phoneme. The training was carried out through a modified variant of back propagation, known as BPS (back propagation for sequences), using half of the tokens for learning and the remaining for the test. The error rate trend thus obtained shows a 27% decrease in a particular range of the magnitude of feedback, with values ranging from 5% for the original TDNN model with no feedback to 3.6% for the proposed RTDNN model.<<ETX>>


IEEE Journal on Selected Areas in Communications | 1991

New directions in the evaluation of voice input/output systems

Cristina Delogu; Andrea Paoloni; Paolo Pocci

A review of new directions in the evaluation of voice input/output systems is presented. Furthermore, recent studies on human factor aspects involved in speech communication are examined, with the idea that human factors research can improve current voice input/output devices and help to experimentally determine implementation guidelines for developing voice interface design. >


international conference on spoken language processing | 1996

Spectral analysis of synthetic speech and natural speech with noise over the telephone line

Cristina Delogu; Andrea Paoloni; Susanna Ragazzini; Paola Ridolfi

In order to explain the different performance obtained with natural and synthetic speech at different linguistic levels over the telephone line, we analyzed the data collected in an experiment where 108 randomized stimuli were presented to 96 subjects. Subjects were required to identify the consonant in 51 CV and 57 VCV meaningful or meaningless words. There were 20 different listening conditions: 6 TTS systems (3 formant-based (SF) and 3 diphone-based (SD)), a pure natural voice (NV) and 3 signal-to-noise (S/N) ratios (6, 0, and -6 dB) for a total of 10 systems, presented both in good and in telephone conditions. The comparison between consonant confusions for natural and synthetic speech with comparable overall levels of intelligibility performance showed that the distributions of the consonant confusions for natural and synthetic speech were often quite different in each condition. Some analyses of different spectrograms suggests that such confusions are due to some problems in the phonetic rules and to the telephone line.


affective computing and intelligent interaction | 2009

Transmission of vocal emotion: Do we have to care about the listener? The case of the Italian speech corpus EMOVO

Carlo Giovannella; Davide Conflitti; Riccardo Santoboni; Andrea Paoloni

The evaluation of emotionally colored non-sense sentences contained in the Italian vocal database EMOVO has been performed by means of a new testing tool based on the Plutchicks finite stated model of emotions. The validation of the corpus has been performed by taking into account also the ability of the listeners to recognize a given emotion. Such a detailed analysis allowed us to identify the unreliable listeners and to operate a more accurate assessment of the vocal database and of the speakers.


international workshop on machine learning for signal processing | 2012

Single-sided objective speech intelligibility assessment based on Sparse signal representation

Giovanni Costantini; Massimiliano Todisco; Renzo Perfetti; Andrea Paoloni; Giovanni Saggio

Transcription of speech signals, originating from a lawful interception, is particularly important in the forensic phonetics framework. These signals are often degraded and the transcript may not replicate what was actually pronounced. In the absence of the clean signal, the only way to estimate the level of accuracy that can be obtained in the transcription is to develop an objective methodology for intelligibility measurements. In this paper a method based on the Normalized Spectrum Envelope (NSE) and Sparse Non-negative Matrix Factorization (SNMF) is proposed to evaluate the signal intelligibility. The approaches are tested with three different noise types and the results are compared with the speech intelligibility scores measured by subjective tests. The results of the experiments show a high correlation between objective measurements and subjective evaluations. Therefore, the proposed methodology can be successfully used in order to establish whether a given intercepted signal can be transcribed with sufficient reliability.


international conference on spoken language processing | 1996

Predictive neural networks in text independent speaker verification: an evaluation on the SIVA database

Andrea Paoloni; Susanna Ragazzini; Giacomo Ravaioli

The authors propose a system which combines the use of predictive neural networks and the statistical approach in the task of text-independent speaker verification through a telephone line. The system is composed of a predictive neural network for every reference speaker, which is trained with the backpropagation algorithm and the maximum likelihood criterion, in order to obtain the highest probability that the input to the network belongs to the reference speaker. They also consider a global network trained on the whole training set whose likelihood gives a measure of the predictability of a given input with the aim of eliminating the strong dependence of the score from the particular input considered. The evaluation of the system is carried out on a subset of the Italian telephonic database SIVA, purposely collected for the considered task.


International Journal of Speech Technology | 1997

A field evaluation of the Italian “Automated reverse directory assistance” service

Cristina Delogu; Andrea Paoloni; Paola Ridolfi; Ciro Sementina

The paper describes a field evaluation of the automated ‘reverse directory assistance’ service presently in use in Italy in which information about names and addresses is provided by a TTS system. A simulation of the service using a natural voice was also run to get comparative data. Both services were accessed from an office room and a call-box on the street. Different evaluation metrics, such as intelligibility, task completion, task correctness, transaction success, and users reactions were used. The aim of the work was to evaluate TTS synthesis in real world use and to make a comparison between laboratory data and data on system performance in a real application. Such a comparison suggested that in laboratory tests more attention should be dedicated to simulate more closely the conditions that can be predicted in real world use, by including important aspects that are generally not taken into consideration in laboratory tests and that are likely to have a large influence on TTS system performance such as environmental noise, prosody, and task complexity. The results also underline the importance of field evaluations to get an overall view of the usability of a service in real applications and with users who are as similar as possible to actual users.


AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication | 1997

Text Independent Speaker Verfication Using Multiple-state Predictive Neural Networks

Andrea Paoloni; Susy Ragazzini; Giacomo Ravaioli

In this paper we propose a system which combines the use of predictive neural networks and the statistical approach in the task of text-independent speaker verification through the telephone line.


international conference on acoustics, speech, and signal processing | 1991

An acoustical pattern classifier based on N-depth projection on privileged eigenstructures

Mauro Falcone; Andrea Paoloni

A geometrical vector classifier is applied to the problem of phonetic classification in several experimental environments. The algorithm is based on the measure of similarity between the original vector and the ones reconstructed using a N-depth projection on the eigenvectors related to the covariance matrix of each category to be classified. For each category (i.e. for each phoneme) there is a privileged subspace of arbitrary dimension and with N axes where the similarity of the training vector set is maximized. These geometrical subspaces are characterized in relation to databases, speaker dependence, speech emission, and signal parametrization. Experiments were performed using three small databases: a four-speaker continuous speech, a single-speaker isolated words, and a single-speaker continuous speech database. Results are reported for closed tests (where training and classification were performed on the same database), and for open tests (where they were performed on different databases). It is concluded that the proposed method may, in some cases, successfully substitute for vector quantizer techniques.<<ETX>>


Speech Communication | 2000

Subjective age estimation of telephonic voices

Loredana Cerrato; Mauro Falcone; Andrea Paoloni

Collaboration


Dive into the Andrea Paoloni's collaboration.

Top Co-Authors

Avatar

Giovanni Costantini

University of Rome Tor Vergata

View shared research outputs
Top Co-Authors

Avatar

Massimiliano Todisco

University of Rome Tor Vergata

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Carlo Giovannella

University of Rome Tor Vergata

View shared research outputs
Top Co-Authors

Avatar

Ciro Sementina

Sapienza University of Rome

View shared research outputs
Top Co-Authors

Avatar

Davide Conflitti

University of Rome Tor Vergata

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Riccardo Santoboni

University of Rome Tor Vergata

View shared research outputs
Top Co-Authors

Avatar

F. Greco

Fondazione Ugo Bordoni

View shared research outputs
Researchain Logo
Decentralizing Knowledge