Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Rosa González Hautamäki is active.

Publication


Featured researches published by Rosa González Hautamäki.


Speech Communication | 2015

Automatic versus human speaker verification: The case of voice mimicry

Rosa González Hautamäki; Tomi Kinnunen; Ville Hautamäki; Anne-Maria Laukkanen

In this work, we compare the performance of three modern speaker verification systems and non-expert human listeners in the presence of voice mimicry. Our goal is to gain insights on how vulnerable speaker verification systems are to mimicry attack and compare it to the performance of human listeners. We study both traditional Gaussian mixture model-universal background model (GMM-UBM) and an i-vector based classifier with cosine scoring and probabilistic linear discriminant analysis (PLDA) scoring. For the studied material in Finnish language, the mimicry attack decreased lightly the equal error rate (EER) for GMM-UBM from 10.83 to 10.31, while for i-vector systems the EER increased from 6.80 to 13.76 and from 4.36 to 7.38. The performance of the human listening panel shows that imitated speech increases the difficulty of the speaker verification task. It is even more difficult to recognize a person who is intentionally concealing his or her identity. For Impersonator A, the average listener made 8 errors from 34 trials while the automatic systems had 6 errors in the same set. The average listener for Impersonator B made 7 errors from the 28 trials, while the automatic systems made 7 to 9 errors. A statistical analysis of the listener performance was also conducted. We found out a statistically significant association, with p ¼ 0:00019 and R 2 ¼ 0:59, between listener accuracy and self reported factors only when familiar voices were present in the test.


international conference on acoustics, speech, and signal processing | 2017

RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research

Tomi Kinnunen; Sahidullah; Mauro Falcone; Luca Costantini; Rosa González Hautamäki; Dennis Alexander Lehmann Thomsen; Achintya Kumar Sarkar; Zheng-Hua Tan; Héctor Delgado; Massimiliano Todisco; Nicholas W. D. Evans; Ville Hautamäki; Kong Aik Lee

This paper describes a new database for the assessment of automatic speaker verification (ASV) vulnerabilities to spoofing attacks. In contrast to other recent data collection efforts, the new database has been designed to support the development of replay spoofing countermeasures tailored towards the protection of text-dependent ASV systems from replay attacks in the face of variable recording and playback conditions. Derived from the re-recording of the original RedDots database, the effort is aligned with that in text-dependent ASV and thus well positioned for future assessments of replay spoofing countermeasures, not just in isolation, but in integration with ASV. The paper describes the database design and re-recording, a protocol and some early spoofing detection results. The new “RedDots Replayed” database is publicly available through a creative commons license.


Speech Communication | 2017

Acoustical and perceptual study of voice disguise by age modification in speaker verification

Rosa González Hautamäki; Sahidullah; Ville Hautamäki; Tomi Kinnunen

Abstract The task of speaker recognition is feasible when the speakers are co-operative or wish to be recognized. While modern automatic speaker verification (ASV) systems and some listeners are good at recognizing speakers from modal, unmodified speech, the task becomes notoriously difficult in situations of deliberate voice disguise when the speaker aims at masking his or her identity. We approach voice disguise from the perspective of acoustical and perceptual analysis using a self-collected corpus of 60 native Finnish speakers (31 female, 29 male) producing utterances in normal, intended young and intended old voice modes. The normal voices form a starting point and we are interested in studying how the two disguise modes impact the acoustical parameters and perceptual speaker similarity judgments. First, we study the effect of disguise as a relative change in fundamental frequency ( F 0) and formant frequencies ( F 1 to F 4) from modal to disguised utterances. Next, we investigate whether or not speaker comparisons that are deemed easy or difficult by a modern ASV system have a similar difficulty level for the human listeners. Further, we study affecting factors from listener-related self-reported information that may explain a particular listener’s success or failure in speaker similarity assessment. Our acoustic analysis reveals a systematic increase in relative change in mean F 0 for the intended young voices while for the intended old voices, the relative change is less prominent in most cases. Concerning the formants F 1 through F 4, 29% (for male) and 30% (for female) of the utterances did not exhibit a significant change in any formant value, while the remaining  ∼ 70% of utterances had significant changes in at least one formant. Our listening panel consists of 70 listeners, 32 native and 38 non-native, who listened to 24 utterance pairs selected using rankings produced by an ASV system. The results indicate that speaker pairs categorized as easy by our ASV system were also easy for the average listener. Similarly, the listeners made more errors in the difficult trials. The listening results indicate that target (same speaker) trials were more difficult for the non-native group, while the performance for the non-target pairs was similar for both native and non-native groups.


conference of the international speech communication association | 2016

Robust Speaker Recognition with Combined Use of Acoustic and Throat Microphone Speech

Md. Sahidullah; Rosa González Hautamäki; Dennis Alexander Lehmann Thomsen; Tomi Kinnunen; Zheng-Hua Tan; Ville Hautamäki; Robert Parts; Martti Pitkänen

Accuracy of automatic speaker recognition (ASV) systems degrades severely in the presence of background noise. In this paper, we study the use of additional side information provided by a body-conducted sensor, throat microphone. Throat microphone signal is much less affected by background noise in comparison to acoustic microphone signal. This makes throat microphones potentially useful for feature extraction or speech activity detection. This paper, firstly, proposes a new prototype system for simultaneous data-acquisition of acoustic and throat microphone signals. Secondly, we study the use of this additional information for both speech activity detection, feature extraction and fusion of the acoustic and throat microphone signals. We collect a pilot database consisting of 38 subjects including both clean and noisy sessions. We carry out speaker verification experiments using Gaussian mixture model with universal background model (GMM-UBM) and i-vector based system. We have achieved considerable improvement in recognition accuracy even in highly degraded conditions.


Odyssey 2016 | 2016

Age-Related Voice Disguise and its Impact on Speaker Verification Accuracy.

Rosa González Hautamäki; Md. Sahidullah; Tomi Kinnunen; Ville Hautamäki

This study focuses in the impact of age-related intentional voice modification, or age disguise, on the performance of automatic speaker verification (ASV) systems. The data collected for this study includes 60 native Finnish speakers (29 males, 31 females) with age range between 18 and 73 years. The corpus consist of two sessions of read speech per speaker. Our experiments demonstrate vulnerability of modern ASV systems when a person attempts to conceal his or her identity, by modifying the voice to sound like an old or young person. For our i-vector PLDA system, the increase in equal error rate (EER), in the case of male speakers, was 7-fold for the attempt of old voice and 11-fold for young voice. Similar degradation in performance is observed for female speakers with a 5-fold increase in EER for old voice disguise and a 6-fold increase for young voice disguise. We further analyze the factors affecting the performance of ASV systems for the studied speech data. In our experiments, male speakers were found more successful in disguising their voices. The effect on fundamental frequency (F0) was also studied. The mean F0 distributions showed a shift towards higher frequencies when speakers attempted a young voice, which relates to the perception that younger speakers’ F0 values tend to be higher than for older speakers.


IEEE Transactions on Audio, Speech, and Language Processing | 2018

Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones

Md. Sahidullah; Dennis Alexander Lehmann Thomsen; Rosa González Hautamäki; Tomi Kinnunen; Zheng-Hua Tan; Robert Parts; Martti Pitkänen

While having a wide range of applications, automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, in particular, replay attacks that are effective and easy to implement. Most prior work on detecting replay attacks uses audio from a single acoustic microphone only, leading to difficulties in detecting high-end replay attacks close to indistinguishable from live human speech. In this paper, we study the use of a special body-conducted sensor, throat microphone (TM), for combined voice liveness detection (VLD) and ASV in order to improve both robustness and security of ASV against replay attacks. We first investigate the possibility and methods of attacking a TM-based ASV system, followed by a pilot data collection. Second, we study the use of spectral features for VLD using both single-channel and dual-channel ASV systems. We carry out speaker verification experiments using Gaussian mixture model with universal background model (GMM-UBM) and i-vector based systems on a dataset of 38 speakers collected by us. We have achieved considerable improvement in recognition accuracy, with the use of dual-microphone setup. In experiments with noisy test speech, the false acceptance rate (FAR) of the dual-microphone GMM-UBM based system for recorded speech reduces from 69.69% to 18.75%. The FAR of replay condition further drops to 0% when this dual-channel ASV system is integrated with the new dual-channel voice liveness detector.


conference of the international speech communication association | 2013

I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry.

Rosa González Hautamäki; Tomi Kinnunen; Ville Hautamäki; Timo Leino; Anne-Maria Laukkanen


conference of the international speech communication association | 2013

Merging human and automatic system decisions to improve speaker recognition performance

Rosa González Hautamäki; Ville Hautamäki; Padmanabhan Rajan; Tomi Kinnunen


Proceedings of the 1st international conference on Forensic applications and techniques in telecommunications, information, and multimedia and workshop | 2008

Automatic voice activity detection in different speech applications

Marko Tuononen; Rosa González Hautamäki; Pasi Fränti


1st International ICST Conference on Forensic Applications and Techniques in Telecommunications, Information and Multimedia | 2010

Automatic Voice Activity Detection in Different Speech Applications

Marko Tuononen; Rosa González Hautamäki; Pasi Fränti

Collaboration


Dive into the Rosa González Hautamäki's collaboration.

Top Co-Authors

Avatar

Tomi Kinnunen

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar

Ville Hautamäki

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar

Md. Sahidullah

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marko Tuononen

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar

Pasi Fränti

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar

Sahidullah

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar

Anssi Kanervisto

University of Eastern Finland

View shared research outputs
Researchain Logo
Decentralizing Knowledge