Solange Rossato
University of Grenoble
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Solange Rossato.
2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD) | 2013
Frédéric Aman; Michel Vacher; Solange Rossato; François Portet
By 2050, about a third of the French population will be over 65. In the context of technologies development aiming at helping aged people to live independently at home, the CIRDO project aims at implementing an ASR system into a social inclusion product designed for elderly people in order to detect distress situations. Speech recognition systems present higher word error rate when speech is uttered by elderly speakers compared to when non-aged voice is considered. Two specialized corpora in French, AD80 and ERES38, were recorded in this framework by aged people, they were used first to study the possibility of adaptation of standard ASR to aged voice. Then we looked at whether the variability of the WER between speakers could be correlated with the level of dependence. Then, we assessed the performance of distress sentence detection by a filter and we demonstrated a significant drop in performance for those with the lowest degree of autonomy.
2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) | 2015
Michel Vacher; Benjamin Lecouteux; Javier Serrano Romero; Moez Ajili; François Portet; Solange Rossato
In voice controlled multi-room smart homes ASR and speaker identification systems face distance speech conditions which have a significant impact on performance. Regarding voice command recognition, this paper presents an approach which selects dynamically the best channel and adapts models to the environmental conditions. The method has been tested on data recorded with 11 elderly and visually impaired participants in a real smart home. The voice command recognition error rate was 3.2% in off-line condition and of 13.2% in online condition. For speaker identification, the performances were below very speaker dependant. However, we show a high correlation between performance and training size. The main difficulty was the too short utterance duration in comparison to state of the art studies. Moreover, speaker identification performance depends on the size of the adapting corpus and then users must record enough data before using the system.
international conference on acoustics, speech, and signal processing | 2011
Juliette Kahn; Nicolas Audibert; Solange Rossato; Jean-François Bonastre
This paper describes the participation of the LIA in the Human Assisted Speaker Recognition (HASR) task of the NIST-SRE 2010 evaluation campaign and its extension to a larger number of listeners. The human performance in such unfavorable conditions is analyzed in relation to the decision of a speaker recognition automatic system. Results of the perception test showed an important inter-trial variability (from 3% to 90% of correct answers for non-target trials) whereas there was no significant difference between the experienced and inexperienced listeners. Some complementarity between speaker verification system and human decisions was also found.
international conference on human computer interaction | 2015
Michel Vacher; Frédéric Aman; Solange Rossato; François Portet
Vocal command may have considerable advantages in terms of usability in the AAL domain. However, efficient audio analysis in smart home environment is a challenging task in large part because of bad speech recognition results in the case of elderly people. Dedicated speech corpora were recorded and employed to adapted generic speech recognizers to this type of population. Evaluation results of a first experiment allowed to draw conclusions about the distress call detection. A second experiments involved participants who played fall scenarios in a realistic smart home, 67 % of the distress calls were detected online. These results show the difficulty of the task and serve as basis to discuss the stakes and the challenges of this promising technology for AAL.
iberoamerican congress on pattern recognition | 2015
Moez Ajili; Jean-François Bonastre; Solange Rossato; Juliette Kahn; Itshak Lapidot
In forensic voice comparison, it is strongly recommended to follow the Bayesian paradigm to present a forensic evidence to the court. In this paradigm, the strength of the forensic evidence is summarized by a likelihood ratio (LR). But in the real world, to base only on the LR without looking to its degree of reliability does not allow experts to have a good judgement. This work is mainly motivated by the need to quantify this reliability. In this concept, we think that the presence of speaker specific information and its homogeneity between the two signals to compare should be evaluated. This paper is dedicated to the latter, the homogeneity. We propose an information theory based homogeneity measure which determines whether a voice comparison is feasible or not.
conference of the international speech communication association | 2015
Michel Vacher; Benjamin Lecouteux; Frédéric Aman; Solange Rossato; François Portet
This paper presents a system to recognize distress speech in the home of seniors to provide reassurance and assistance. The system is aiming at being integrated into a larger system for Ambient Assisted Living (AAL) using only one microphone with a fix position in a non-intimate room. The paper presents the details of the automatic speech recognition system which must work under distant speech condition and with expressive speech. Moreover, privacy is ensured by running the decoding on-site and not on a remote server. Furthermore the system was biased to recognize only set of sentences defined after a user study. The system has been evaluated in a smart space reproducing a typical living room where 17 participants played scenarios including falls during which they uttered distress calls. The results showed a promising error rate of 29% while emphasizing the challenges of the task. Index Terms: Smart home, Vocal distress call, Applications of speech technology for Ambient Assisted Living
international conference on acoustics, speech, and signal processing | 2017
Moez Ajili; Jean-François Bonastre; Waad Ben Kheder; Solange Rossato; Juliette Kahn
Forensic Voice Comparison (FVC) is increasingly using the likelihood ratio (LR) in order to indicate whether the evidence supports the prosecution (same-speaker) or defender (different-speakers) hypotheses. Nevertheless, the LR accepts some practical limitations due both to its estimation process itself and to a lack of knowledge about the reliability of this (practical) estimation process. It is particularly true when FVC is considered using Automatic Speaker Recognition (ASR) systems. Indeed, in the LR estimation performed by ASR systems, different factors are not considered such as speaker intrinsic characteristics, denoted “speaker factor”, the amount of information involved in the comparison as well as the phonological content and so on. This article focuses on the impact of phonological content on FVC involving two different speakers and more precisely the potential implication of a specific phonemic category on wrongful conviction cases (innocents are send behind bars). We show that even though the vast majority of speaker pairs (more than 90%) are well discriminated, few pairs are difficult to distinguish. For the “best” discriminated pairs, all the phonemic content play a positive role in speaker discrimination while for the “worst” pairs, it appears that nasals have a negative effect and lead to a confusion between speakers.
international conference on acoustics, speech, and signal processing | 2010
Juliette Kahn; Solange Rossato; Jean-François Bonastre
During the last decade, speaker verification systems have shown significant progress and have reached a level of performance and accuracy that support their utilization in practical applications, including the forensic ones. This context emphasizes the importance of a deeper analysis of the systems performance over basic error rate. In this paper, the influence of the speaker (his/her ‘voice’) on the performance is studied and the effect of the model (the training excerpt) is investigated. The experimental setup is based on an open source system and the experimental context of NIST-SRE 2008. The results confirm that the lower performances are obtained from a reduced number of speakers. Even more than speaker factor, speaker verification system performances are shown to be highly dependant on the voice samples used to train speaker models.
Odyssey 2018 The Speaker and Language Recognition Workshop | 2018
Moez Ajili; Solange Rossato; Dan Zhang; Jean-François Bonastre
It is common to see voice recordings being presented as a forensic trace in court. Generally, a forensic expert is asked to analyze both suspect and criminals voice samples in order to indicate whether the evidence supports the prosecution (same-speaker) or defence (different-speakers) hypotheses. This process is known as Forensic Voice Comparison (FVC). Since the emergence of the DNA typing model, the likelihood-ratio (LR) framework has become the new golden standard in forensic sciences. The LR not only supports one of the hypotheses but also quantifies the strength of its support. However, the LR accepts some practical limitations due to its estimation process itself. It is particularly true when Automatic Speaker Recognition (ASpR) systems are considered as they are outputting a score in all situations regardless of the case specific conditions. Indeed, several factors are not taken into account by the estimation process like the quality and quantity of information in both voice recordings, their phonological content or also the speakers intrinsic characteristics, etc. All these factors put into question the validity and reliability of FVC. In our recent study, we showed that intra-speaker variability explains 2/3 of the system losses. In this article, we investigate the relations between intra-speaker variability and rhythmic parameters.
international conference on biometrics | 2016
Nivedita Yadav; Solange Rossato; Juliette Kahn; Jean-François Bonastre
Speaker voice characteristics are an important aspect of forensic phonetics. Previous studies have suggested that all the features present in the speech signals are not equally important for speaker discrimination, and it is well-known that subsets of phonemes are more informative than others. However, most of theses studies have concerned a whole group of speakers, without taking into account the speaker specificities. This paper presents a framework for the selection of a subset of phonemes from the speech signal at the speaker level in order to capture the speaker variability in this selection process. We present the approach for the selection of the most discriminatory phonemes and a preliminary study have been performed on French reading speech database. At the global level, the most discriminatory phonemes are compared to previous studies. At the speaker level, we have examined the inter-speaker variability according to their most discriminatory phonemes. The experimental results verified the effectiveness of the proposed framework.