Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jérôme Farinas is active.

Publication


Featured researches published by Jérôme Farinas.


Speech Communication | 2005

Rhythmic unit extraction and modelling for automatic language identification

Jean-Luc Rouas; Jérôme Farinas; François Pellegrino; Régine André-Obrecht

This paper deals with an approach to Automatic Language Identification based on rhythmic modelling. Beside phonetics and phonotactics, rhythm is actually one of the most promising features to be considered for language identification, even if its extraction and modelling are not a straightforward issue. Actually, one of the main problems to address is what to model. In this paper, an algorithm of rhythm extraction is described: using a vowel detection algorithm, rhythmic units related to syllables are segmented. Several parameters are extracted (consonantal and vowel duration, cluster complexity) and modelled with a Gaussian Mixture. Experiments are performed on read speech for 7 languages (English, French, German, Italian, Japanese, Mandarin and Spanish) and results reach up to 86 ± 6% of correct discrimination between stress-timed mora-timed and syllable-timed classes of languages, and to 67 ± 8% percent of correct language identification on average for the 7 languages with utterances of 21 seconds. These results are commented and compared with those obtained with a standard acoustic Gaussian mixture modelling approach (88 ± 5% of correct identification for the 7-languages identification task).


international conference on acoustics, speech, and signal processing | 2003

Modeling prosody for language identification on read and spontaneous speech

Jean-Luc Rouas; Jérôme Farinas; François Pellegrino; Régine André-Obrecht

This paper deals with an approach to Automatic Language Identification using only prosodic modeling. The actual approach for language identification focuses mainly on phonotactics because it gives the best results. We propose here to evaluate the relevance of prosodic information for language identification with read studio recording (previous experiment [l]) and spontaneous telephone speech. For read speech, experiments were performed on the five languages of the MULTEXT database [Z]. On the MULTEXT corpus, our prosodic system achieved an identification rate of 79 % on the five languages discrimination task. For spontaneous speech, experiments are made on the ten languages of the OGI Multilingual telephone speech corpus [3]. On the OGI MLTS corpus, the results are given for languages pair discrimination tasks, and are compared with results from [4]. As a conclusion, if our prosodic system achieves good performance on read speech, it might not take into account the complexity of spontaneous speech prosody.


international conference on acoustics, speech, and signal processing | 2002

Merging segmental and rhythmic features for Automatic Language Identification

Jérôme Farinas; François Pellegrino; Jean-Luc Rouas; Régine André-Obrecht

This paper deals with an approach to Automatic Language Identification based on rhythmic modeling and vowel system modeling. Experiments are performed on read speech for 5 European languages. They show that rhythm and stress may be automatically extracted and are relevant in language identification: using cross-validation, 78% of correct identification is reached with 21 s. utterances. The Vowel System Modeling, tested in the same conditions (cross-validation), is efficient and results in a 70% of correct identification for the 21 s. utterances. Last. merging the two models slightly improves the results.


ACM Transactions on Accessible Computing | 2015

Automatic Assessment of Speech Capability Loss in Disordered Speech

Thomas Pellegrini; Lionel Fontan; Julie Mauclair; Jérôme Farinas; Charlotte Alazard-Guiu; Marina Robert; Peggy Gatignol

In this article, we report on the use of an automatic technique to assess pronunciation in the context of several types of speech disorders. Even if such tools already exist, they are more widely used in a different context, namely, Computer-Assisted Language Learning, in which the objective is to assess nonnative pronunciation by detecting learners’ mispronunciations at segmental and/or suprasegmental levels. In our work, we sought to determine if the Goodness of Pronunciation (GOP) algorithm, which aims to detect phone-level mispronunciations by means of automatic speech recognition, could also detect segmental deviances in disordered speech. Our main experiment is an analysis of speech from people with unilateral facial palsy. This pathology may impact the realization of certain phonemes such as bilabial plosives and sibilants. Speech read by 32 speakers at four different clinical severity grades was automatically aligned and GOP scores were computed for each phone realization. The highest scores, which indicate large dissimilarities with standard phone realizations, were obtained for the most severely impaired speakers. The corresponding speech subset was manually transcribed at phone level; 8.3% of the phones differed from standard pronunciations extracted from our lexicon. The GOP technique allowed the detection of 70.2% of mispronunciations with an equal rate of about 30% of false rejections and false acceptances. Finally, to broaden the scope of the study, we explored the correlation between GOP values and speech comprehensibility scores on a second corpus, composed of sentences recorded by six people with speech impairments due to cancer surgery or neurological disorders. Strong correlations were achieved between GOP scores and subjective comprehensibility scores (about 0.7 absolute). Results from both experiments tend to validate the use of GOP to measure speech capability loss, a dimension that could be used as a complement to physiological measures in pathologies causing speech disorders.


Journal of Speech Language and Hearing Research | 2017

Automatic Speech Recognition Predicts Speech Intelligibility and Comprehension for Listeners With Simulated Age-Related Hearing Loss

Lionel Fontan; Isabelle Ferrané; Jérôme Farinas; Julien Pinquier; Julien Tardieu; Cynthia Magnen; Pascal Gaillard; Xavier Aumont; Christian Füllgrabe

Purpose The purpose of this article is to assess speech processing for listeners with simulated age-related hearing loss (ARHL) and to investigate whether the observed performance can be replicated using an automatic speech recognition (ASR) system. The long-term goal of this research is to develop a system that will assist audiologists/hearing-aid dispensers in the fine-tuning of hearing aids. Method Sixty young participants with normal hearing listened to speech materials mimicking the perceptual consequences of ARHL at different levels of severity. Two intelligibility tests (repetition of words and sentences) and 1 comprehension test (responding to oral commands by moving virtual objects) were administered. Several language models were developed and used by the ASR system in order to fit human performances. Results Strong significant positive correlations were observed between human and ASR scores, with coefficients up to .99. However, the spectral smearing used to simulate losses in frequency selectivity caused larger declines in ASR performance than in human performance. Conclusion Both intelligibility and comprehension scores for listeners with simulated ARHL are highly correlated with the performances of an ASR-based system. In the future, it needs to be determined if the ASR system is similarly successful in predicting speech processing in noise and by older people with ARHL.


conference of the international speech communication association | 2016

Using Phonologically Weighted Levenshtein Distances for the Prediction of Microscopic Intelligibility

Lionel Fontan; Isabelle Ferrané; Jérôme Farinas; Julien Pinquier; Xavier Aumont

This article presents a new method for analyzing Automatic Speech Recognition (ASR) results at the phonological feature level. To this end the Levenshtein distance algorithm is refined in order to take into account the distinctive features opposing substituted phonemes. This method allows to survey features additions or deletions, providing microscopic qualitative information as a complement to word recognition scores. To explore the relevance of the qualitative data gathered by this method, a study is conducted on a speech corpus simulating presbycusis effects on speech perception at eight severity stages. Consonantic features additions and deletions in ASR outputs are analyzed and put in relation with intelligibility data collected in 30 human subjects. ASR results show monotonic trends in most conso- nantic features along the degradation conditions, which appear to be consistent with the misperceptions that could be observed in human subjects.


conference of the international speech communication association | 2016

Pronunciation assessment of Japanese learners of French with GOP scores and phonetic information

Vincent Laborde; Thomas Pellegrini; Lionel Fontan; Julie Mauclair; Halima Sahraoui; Jérôme Farinas

In this paper, we report automatic pronunciation assessment experiments at phone-level on a read speech corpus in French, collected from 23 Japanese speakers learning French as a foreign language. We compare the standard approach based on Goodness Of Pronunciation (GOP) scores and phone-specific score thresholds to the use of logistic regressions (LR) models. French native speech corpus, in which artificial pronunciation errors were introduced, was used as training set. Two typical errors of Japanese speakers were considered: /o/ and /v/ of ten mispronounced as [l] and [b], respectively. The LR classifier achieved a 64.4% accuracy similar to the 63.8% accuracy of the baseline threshold method, when using GOP scores and the expected phone identity as input features only. A significant performance gain of 20.8% relative was obtained by adding phonetic and phonological features as input to the LR model, leading to a 77.1% accuracy. This LR model also outperformed another baseline approach based on linear discriminant models trained on raw f-BANK coefficient features.


content based multimedia indexing | 2008

Automatic low-dimensional analysis of audio databases

José Anibal Arias; Régine André-Obrecht; Jérôme Farinas

In this paper we present an approach designed to map variable size audio sequences into fixed-length vectors, useful to discover contents of audio databases. First, we model standard audio parameters with Gaussian mixture models (GMM). Then, symmetric Kullback-Leiber divergences between models are approximated with a Monte-Carlo method. We use these statistical dissimilarities to find a low-dimensional representation of each audio sequence through Multidimensional scaling (MDS) algorithm. Vectors in low-dimensional spaces are then easily explored with kernel and clustering methods. Experiments carried out in different kind of audio databases (music, speakers and languages) show good potential of the proposed approach and provide a framework for more challenging applications.


Journal of the Acoustical Society of America | 2006

Cepstral coefficients and hidden Markov models reveal idiosyncratic voice characteristics in red deer (Cervus elaphus) stags

David Reby; Régine André-Obrecht; Arnaud Galinier; Jérôme Farinas; Bruno Cargnelutti


conference of the international speech communication association | 2001

Automatic Rhythm Modeling for Language Identification

Jérôme Farinas; François Pellegrino

Collaboration


Dive into the Jérôme Farinas's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jean-Luc Rouas

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Régine André-Obrecht

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Julie Mauclair

Paris Descartes University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge